RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS0PR01MB5716923268FE208003DC784094462@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers
On Tuesday, February 6, 2024 3:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > >
> > > ---
> > > Since Two processes (e.g. the slotsync worker and
> > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > information, there is a race condition where slot's
> > > confirmed_flush_lsn goes backward.
> > >
> >
> > Right, this is possible, though there shouldn't be a problem because
> > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > required WAL won't be removed. Having said that, I can think of two
> > ways to avoid it: (a) We can have some flag in shared memory using
> > which we can detect whether any other process is doing slot
> > syncronization and then either error out at that time or simply wait
> > or may take nowait kind of parameter from user to decide what to do?
> > If this is feasible, we can simply error out for the first version and
> > extend it later if we see any use cases for the same (b) similar to
> > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > error, this is good for now but in future we may still have another
> > similar issue, so I would prefer (a) among these but I am fine if you
> > prefer (b) or have some other ideas like just note down in comments
> > that this is a harmless case and can happen only very rarely.
> 
> Thank you for sharing the ideas. I would prefer (a). For (b), the same issue still
> happens for other fields.

Attach the V79 patch which includes the following changes. (Note that only
0001 is sent in this version, we will send the later patches after rebasing)

1. Address all the comments from Amit[1], all the comments from Peter[2] and some of
   the comments from Sawada-san[3].
2. Using a flag in shared to memory to restrcit concurrent slot sync.
3. Add more tap tests for pg_sync_replication_slots function.

[1] https://www.postgresql.org/message-id/CAA4eK1KGHT9S-Bst_G1CUNQvRep%3DipMs5aTBNRQFVi6TogbJ9w%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPtyoRf3adoLoTrbL6momzkhXAFKz656Vv9YRu4cp%3D6Yig%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAD21AoCEkcTaPb%2BGdOhSQE49_mKJG6D64quHcioJGx6RCqMv%2BQ%40mail.gmail.com

Best Regards,
Hou zj

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)
Следующее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby