Обсуждение: BUG #16159: recovery requests WALs for the next timelines before timeline switch LSN has been reached

Поиск
Список
Период
Сортировка

BUG #16159: recovery requests WALs for the next timelines before timeline switch LSN has been reached

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16159
Logged by:          Pavel Suderevsky
Email address:      psuderevsky@gmail.com
PostgreSQL version: 11.6
Operating system:   CentOS 7.6.1810 (3.10.0-957.el7.x86_64)
Description:

Reproduced 11.2,11.6.

If PostgreSQL starts recovery and finds a history file for a timeline that
is higher than current one, it will request file with the segment for the
future timeline (that most likely doesn't exist yet) and only then it will
request file with the segment for current timeline. 
If archive is located on remote storage it can take huge time to find that
segments for the future timelines are not exist yet and therefore recovery
can take too long.

Example:

recovery.conf:
>restore_command = 'echo -e "Searching WAL: %f, location: %p";
/usr/bin/pgbackrest --stanza=platform archive-get %f "%p"'
>recovery_target_timeline = 'latest'
>standby_mode = 'on'

Postgres log during startup:
>
> 2019-12-06 07:11:16 CST  LOG:  database system was shut down in recovery
> at 2019-12-06 07:11:08 CST
> Searching WAL: 00000022.history, location: pg_wal/RECOVERYHISTORY
> 2019-12-06 07:11:16 CST  LOG:  restored log file "00000022.history" from
> archive
> Searching WAL: 00000023.history, location: pg_wal/RECOVERYHISTORY
> 2019-12-06 07:11:16 CST  LOG:  entering standby mode
> Searching WAL: 00000022.history, location: pg_wal/RECOVERYHISTORY
> 2019-12-06 07:11:16 CST  LOG:  restored log file "00000022.history" from
> archive
> Searching WAL: 00000022000018C60000003F, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018C60000003F, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:20 CST  LOG:  restored log file
> "00000021000018C60000003F" from archive
> Searching WAL: 00000021.history, location: pg_wal/RECOVERYHISTORY
> 2019-12-06 07:11:20 CST  LOG:  restored log file "00000021.history" from
> archive
> Searching WAL: 00000022000018BF0000001B, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001B, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:27 CST  LOG:  restored log file
> "00000021000018BF0000001B" from archive
> 2019-12-06 07:11:27 CST  LOG:  redo starts at 18BF/1B311260
> Searching WAL: 00000022000018BF0000001C, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001C, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:34 CST  LOG:  restored log file
> "00000021000018BF0000001C" from archive
> Searching WAL: 00000022000018BF0000001D, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001D, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:40 CST  LOG:  restored log file
> "00000021000018BF0000001D" from archive
> Searching WAL: 00000022000018BF0000001E, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001E, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:46 CST  LOG:  restored log file
> "00000021000018BF0000001E" from archive
> Searching WAL: 00000022000018BF0000001F, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001F, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:53 CST  LOG:  restored log file
> "00000021000018BF0000001F" from archive

As you can see Postgres tries to restore  00000022* WALs before timeline
switch LSN has been reached while restoring 00000021*.


Re: BUG #16159: recovery requests WALs for the next timelinesbefore timeline switch LSN has been reached

От
Kyotaro Horiguchi
Дата:
Hello.

In short, it is not a bug.

At Wed, 11 Dec 2019 12:39:20 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in 
> The following bug has been logged on the website:
> 
> Bug reference:      16159
> Logged by:          Pavel Suderevsky
> Email address:      psuderevsky@gmail.com
> PostgreSQL version: 11.6
> Operating system:   CentOS 7.6.1810 (3.10.0-957.el7.x86_64)
> Description:        
> 
> Reproduced 11.2,11.6.
> 
> If PostgreSQL starts recovery and finds a history file for a timeline that
> is higher than current one, it will request file with the segment for the
> future timeline (that most likely doesn't exist yet) and only then it will
> request file with the segment for current timeline.

The cause of the "future" timeline is that the standby has received
the history file for TLI=22 but has not completed replaying the first
checkpoint after promotion. In that case, WAL files before the
timeline switch should not exist for TLI=22 and PostgreSQL is making
sure that by peeking the archive for the file.

Since standby always starts archive recovery from the REDO location of
the last checkpoint performed on the standby(or the restart point),
the WAL amount to read is irrelevant to promotion.

> If archive is located on remote storage it can take huge time to find that
> segments for the future timelines are not exist yet and therefore recovery
> can take too long.

I don't think that peeking non-existent remote files takes comparable
amount of time to 16MB transfer. If the problem is the amount of WAL
files to transfer during recovery, I came up of three ways to make
standby startup faster.

1. For operational shutdown/restarts, make sure that the latest
 restart point is close enough to the replay location on the standby
 before shutting down. If not, manual checkpoint on the master then
 that on the standby would help. The functions pg_control_checkpoint()
 and pg_last_wal_replay_lsn() would work for checking that condition.

2. PostgreSQL 11 accepts "always" for the archive_mode GUC setting. It
 enables standby-side archiving.

https://www.postgresql.org/docs/11/runtime-config-wal.html#GUC-ARCHIVE-MODE

3. Decrease max_wal_size or checkopint_timeout on the master, and/or
 decrease checkpoint_timeout on the standby.  This decreases the
 amount of time needed during recovery.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #16159: recovery requests WALs for the next timelines beforetimeline switch LSN has been reached

От
Pavel Suderevsky
Дата:
Hi,

Thank you very much for your explanation and sorry for delay with an answer. 
But for me it still seems that PostgreSQL has enough information to check that no WALs exist for the new timeline to omit searching all the possibly-existing WALs.
0000005300004AE1000000A3
rmgr: Standby     len (rec/tot):     62/    62, tx:          0, lsn: 4AE1/A3000028, prev 4AE1/A28EC5F8, desc: RUNNING_XACTS nextXid 82249762 latestCompletedXid 82249758 oldestRunningXid 82249759; 1 xacts: 82249759
rmgr: XLOG        len (rec/tot):    106/   106, tx:          0, lsn: 4AE1/A3000068, prev 4AE1/A3000028, desc: CHECKPOINT_SHUTDOWN redo 4AE1/A3000068; tli 83; prev tli 83; fpw true; xid 0:82249762; oid 1074976; multi 144; offset 4568; oldest xid 562 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
rmgr: XLOG        len (rec/tot):     24/    24, tx:          0, lsn: 4AE1/A30000D8, prev 4AE1/A3000068, desc: SWITCH
0000005400004AE1000000A4
rmgr: XLOG        len (rec/tot):    106/   106, tx:          0, lsn: 4AE1/A4000028, prev 4AE1/A30000D8, desc: CHECKPOINT_SHUTDOWN redo 4AE1/A4000028; tli 83; prev tli 83; fpw true; xid 0:82249762; oid 1074976; multi 144; offset 4568; oldest xid 562 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
rmgr: XLOG        len (rec/tot):     42/    42, tx:          0, lsn: 4AE1/A4000098, prev 4AE1/A4000028, desc: END_OF_RECOVERY tli 84; prev tli 83; time 2020-01-28 06:29:03.432938 CST
00000054.history
83      4AE1/A4000098   no recovery target specified
It can just look through the first received new-timeline's WAL and ensure timeline switch occured in this WAL. Finally, it can check archive for the only one possibly-existing previous WAL.

Regading influence: issue is not about the large amount of WALs to apply but in searching for the non-existing WALs on the remote storage, each such search can take 5-10 seconds while obtaining existing WAL takes milliseconds.

Regards,
Pavel Suderevsky

чт, 12 дек. 2019 г. в 06:49, Kyotaro Horiguchi <horikyota.ntt@gmail.com>:
Hello.

In short, it is not a bug.

At Wed, 11 Dec 2019 12:39:20 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in
> The following bug has been logged on the website:
>
> Bug reference:      16159
> Logged by:          Pavel Suderevsky
> Email address:      psuderevsky@gmail.com
> PostgreSQL version: 11.6
> Operating system:   CentOS 7.6.1810 (3.10.0-957.el7.x86_64)
> Description:       
>
> Reproduced 11.2,11.6.
>
> If PostgreSQL starts recovery and finds a history file for a timeline that
> is higher than current one, it will request file with the segment for the
> future timeline (that most likely doesn't exist yet) and only then it will
> request file with the segment for current timeline.

The cause of the "future" timeline is that the standby has received
the history file for TLI=22 but has not completed replaying the first
checkpoint after promotion. In that case, WAL files before the
timeline switch should not exist for TLI=22 and PostgreSQL is making
sure that by peeking the archive for the file.

Since standby always starts archive recovery from the REDO location of
the last checkpoint performed on the standby(or the restart point),
the WAL amount to read is irrelevant to promotion.

> If archive is located on remote storage it can take huge time to find that
> segments for the future timelines are not exist yet and therefore recovery
> can take too long.

I don't think that peeking non-existent remote files takes comparable
amount of time to 16MB transfer. If the problem is the amount of WAL
files to transfer during recovery, I came up of three ways to make
standby startup faster.

1. For operational shutdown/restarts, make sure that the latest
 restart point is close enough to the replay location on the standby
 before shutting down. If not, manual checkpoint on the master then
 that on the standby would help. The functions pg_control_checkpoint()
 and pg_last_wal_replay_lsn() would work for checking that condition.

2. PostgreSQL 11 accepts "always" for the archive_mode GUC setting. It
 enables standby-side archiving.

https://www.postgresql.org/docs/11/runtime-config-wal.html#GUC-ARCHIVE-MODE

3. Decrease max_wal_size or checkopint_timeout on the master, and/or
 decrease checkpoint_timeout on the standby.  This decreases the
 amount of time needed during recovery.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Don't try fetching future segment of a TLI.

От
Grigory Smolkin
Дата:

I`ve bumped into this issue recently:
https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru


On 4/6/20 8:43 PM, Fujii Masao wrote:

The patch looks good to me. Attached is the updated version of the patch.
I updated only comments.

Barring any objection, I will commit this patch.

I`ve been running tests on your patch. So far so good.

On Tue, Apr 07, 2020 at 12:15:00PM +0900, Fujii Masao wrote:
I understood the situation and am fine to back-patch that. But I'm not sure
if it's fair to do that. Maybe we need to hear more opinions about this?
OTOH, feature freeze for v13 is today, so what about committing the patch
in v13 at first, and then doing the back-patch after hearing opinions and
receiving many +1?

+1 to back-patching it.

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company