Re: [HACKERS] make async slave to wait for lsn to be replayed

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: [HACKERS] make async slave to wait for lsn to be replayed
Дата
Msg-id CAPpHfdvGRssjqwX1+idm5Tu-eWsTcx6DthB2LhGqA1tZ29jJaw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] make async slave to wait for lsn to be replayed  (Kartyshov Ivan <i.kartyshov@postgrespro.ru>)
Ответы Re: [HACKERS] make async slave to wait for lsn to be replayed
Список pgsql-hackers
Hi, Ivan!

On Wed, Jun 12, 2024 at 11:36 AM Kartyshov Ivan
<i.kartyshov@postgrespro.ru> wrote:
>
> Hi, Alexander, Here, I made some improvements according to your
> discussion with Heikki.
>
> On 2024-04-11 18:09, Alexander Korotkov wrote:
> > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka@iki.fi>
> > wrote:
> >> In a nutshell, it's possible for the loop in WaitForLSN to exit
> >> without
> >> cleaning up the process from the heap. I was able to hit that by
> >> adding
> >> a delay after the addLSNWaiter() call:
> >>
> >> > TRAP: failed Assert("!procInfo->inHeap"), File: "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
> >> > postgres: heikki postgres [local] CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
> >> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
> >> > postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
> >> > postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
> >> > postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
> >> > postgres: heikki postgres [local] CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]
> >>
> >> I think there's a similar race condition if the timeout is reached at
> >> the same time that the startup process wakes up the process.
> >
> > Thank you for catching this.  I think WaitForLSN() just needs to call
> > deleteLSNWaiter() unconditionally after exit from the loop.
>
> Fix and add injection point test on this race condition.
>
> > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas <hlinnaka@iki.fi>
> > wrote:
> >> The docs could use some-copy-editing, but just to point out one issue:
> >>
> >> > There are also procedures to control the progress of recovery.
> >>
> >> That's copy-pasted from an earlier sentence at the table that lists
> >> functions like pg_promote(), pg_wal_replay_pause(), and
> >> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control
> >> the
> >> progress of recovery like those functions do, it only causes the
> >> calling
> >> backend to wait.
>
> Fix documentation and add extra tests on multi-standby replication
> and cascade replication.

Thank you for the revised patch.

I see couple of items which are not addressed in this revision.
 * As Heikki pointed, that it's currently not possible in one round
trip to call call pg_wal_replay_wait() and do other job.  The attached
patch addresses this.  It milds the requirement.  Now, it's not
necessary to be in atomic context.  It's only necessary to have no
active snapshot.  This new requirement works for me so far.  I
appreciate a feedback on this.
 * As Alvaro pointed, multiple waiters case isn't covered by the test
suite.  That leads to no coverage of some code paths.  The attached
patch addresses this by adding a test case with multiple waiters.

The rest looks good to me.

Links.
1. https://www.postgresql.org/message-id/d1303584-b763-446c-9409-f4516118219f%40iki.fi
2.https://www.postgresql.org/message-id/202404051815.eri4u5q6oj26%40alvherre.pgsql

------
Regards,
Alexander Korotkov
Supabase

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: 回复: An implementation of multi-key sort
Следующее
От: Kohei KaiGai
Дата:
Сообщение: assertion failure at cost_memoize_rescan()