curious postgres (crash) recovery behavior

Поиск
Список
Период
Сортировка
От Zwettler Markus (OIZ)
Тема curious postgres (crash) recovery behavior
Дата
Msg-id 3bf6e04d6433401e9249517419f76ff6@zuerich.ch
обсуждение исходный текст
Ответы Re: curious postgres (crash) recovery behavior  (Rébeli-Szabó Tamás <pub@rblst.info>)
Список pgsql-general

We are using Crunchy PGO which uses "pgbackrest".

Over Christmas we had a disk full error on the "pgbackrest" repo followed by a disk full error on the PostgreSQL instance pod.

Unfortunately, a colleague then deleted the current "pg_wal" directory on the instance pod.

So we had to do a point-in-time recovery to overcome this situation.

 

 

 

we started a PITR to 2022-12-23 01:34 based on the following "pgbackrest" backup:

 

sh-4.4$ pgbackrest info

stanza: db

    status: ok

    cipher: none

 

    db (current)

        wal archive min/max (13): 0000000B0000010B000000C1/0000000C000001110000000A

 

        full backup: 20221222-230004F

            timestamp start/stop: 2022-12-22 23:00:04 / 2022-12-23 01:32:17

            wal start/stop: 0000000B0000010B000000C1 / 0000000B0000010C0000005C

            database size: 46.3GB, database backup size: 46.3GB

            repo1: backup set size: 17.6GB, backup size: 17.6GB

 

 

 

During the point-in-time recovery, the "pgbackrest" pod terminated incorrectly with an error "pg_ctl: server did not start in time".

There is a known PGO bug on this (pg_ctl default timeout of 60 secs cannot be changed).

 

 

PGO started the instance pod anyway in following, which leads to a regular PostgreSQL crash recovery ending with "last completed transaction was at log time 2022-12-23 20:52:29.584555+01":

...

2023-01-04 15:26:35 CET : =>@ : 94-4=>63b58c9b.5e : 00000 LOG:  starting PostgreSQL 13.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit

2023-01-04 15:26:35 CET : =>@ : 94-5=>63b58c9b.5e : 00000 LOG:  listening on IPv4 address "0.0.0.0", port 5432

2023-01-04 15:26:35 CET : =>@ : 94-6=>63b58c9b.5e : 00000 LOG:  listening on IPv6 address "::", port 5432

2023-01-04 15:26:35 CET : =>@ : 94-7=>63b58c9b.5e : 00000 LOG:  listening on Unix socket "/tmp/postgres/.s.PGSQL.5432"

2023-01-04 15:26:35 CET : =>@ : 108-1=>63b58c9b.6c : 00000 LOG:  database system was interrupted while in recovery at log time 2022-12-23 01:06:58 CET

2023-01-04 15:26:35 CET : =>@ : 108-2=>63b58c9b.6c : 00000 HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

2023-01-04 15:26:35 CET : =>@ : 108-3=>63b58c9b.6c : 00000 LOG:  entering standby mode

2023-01-04 15:26:35 CET : =>@ : 108-4=>63b58c9b.6c : 00000 LOG:  restored log file "0000000B.history" from archive

2023-01-04 15:26:35 CET : =>@ : 108-5=>63b58c9b.6c : 00000 LOG:  restored log file "0000000B0000010C0000000C" from archive

...

2023-01-04 15:30:06 CET : =>@ : 108-1198=>63b58c9b.6c : 00000 LOG:  restored log file "0000000B00000110000000AD" from archive

2023-01-04 15:30:06 CET : =>@ : 108-1199=>63b58c9b.6c : 00000 LOG:  received promote request

2023-01-04 15:30:06 CET : =>@ : 108-1200=>63b58c9b.6c : 00000 LOG:  redo done at 110/AD0005B8

2023-01-04 15:30:06 CET : =>@ : 108-1201=>63b58c9b.6c : 00000 LOG:  last completed transaction was at log time 2022-12-23 20:52:29.584555+01

2023-01-04 15:30:07 CET : =>@ : 108-1202=>63b58c9b.6c : 00000 LOG:  restored log file "0000000B00000110000000AD" from archive

2023-01-04 15:30:07 CET : =>@ : 108-1203=>63b58c9b.6c : 00000 LOG:  selected new timeline ID: 12

2023-01-04 15:30:07 CET : =>@ : 108-1204=>63b58c9b.6c : 00000 LOG:  archive recovery complete

2023-01-04 15:30:07 CET : =>@ : 108-1205=>63b58c9b.6c : 00000 LOG:  restored log file "0000000B.history" from archive

2023-01-04 15:30:07 CET : =>@ : 94-9=>63b58c9b.5e : 00000 LOG:  database system is ready to accept connections

...

 

 

 

QUESTION:

=========

I wondered how the crash recovery can run to 2022-12-23 20:52:29.584555 and simply open the database?

 

When I got it right the full backup itself would be consistent with "wal stop 0000000B0000010C0000005C".

PostgreSQL additionally added some wal files until 0000000B00000110000000AD (2022-12-23 20:52:29.584555) because it didn't find any more before the recovery target of 2022-12-23 01:34.

 

Am I right or did I miss something?

 

 

В списке pgsql-general по дате отправления:

Предыдущее
От: Ron
Дата:
Сообщение: Re: REINDEX vs VACUUM
Следующее
От: Rébeli-Szabó Tamás
Дата:
Сообщение: Re: REINDEX vs VACUUM