RE: Slow catchup of 2PC (twophase) transactions on replica in LR

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Дата
Msg-id OSBPR01MB2552707A847936E6803CFAA5F5092@OSBPR01MB2552.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Slow catchup of 2PC (twophase) transactions on replica in LR  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Slow catchup of 2PC (twophase) transactions on replica in LR  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Dear Amit,

> Vitaly, does the minimal solution provided by the proposed patch
> (Allow to alter two_phase option of a subscriber provided no
> uncommitted
> prepared transactions are pending on that subscription.) address your use case?

I think we do not have to handle cases which there are prepared transactions on
publisher/subscriber, as the first step. It leads additional complexity and we
do not have smarter solutions, especially for problem 2.
IIUC it meets the Vitaly's condition, right?

> > 1. While toggling two_phase from true to false, we could probably get a list of
> prepared transactions for this subscriber id and rollback/abort the prepared
> transactions. This will allow the transactions to be re-applied like a normal
> transaction when the commit comes. Alternatively, if this isn't appropriate doing it
> in the ALTER SUBSCRIPTION context, we could store the xids of all prepared
> transactions of this subscription in a list and when the corresponding xid is being
> committed by the apply worker, prior to commit, we make sure the previously
> prepared transaction is rolled back. But this would add the overhead of checking
> this list every time a transaction is committed by the apply worker.
> >
> 
> In the second solution, if you check at the time of commit whether
> there exists a prior prepared transaction then won't we end up
> applying the changes twice? I think we can first try to achieve it at
> the time of Alter Subscription because the other solution can add
> overhead at each commit?

Yeah, at least the second solution might be problematic. I prototyped
the first one and worked well. However, to make the feature more consistent,
it is prohibit to exist prepared transactions on subscriber for now.
We can ease based on the requirement.

> > 2. No solution yet.
> >
> 
> One naive idea is that on the publisher we can remember whether the
> prepare has been sent and if so then only send commit_prepared,
> otherwise send the entire transaction. On the subscriber-side, we
> somehow, need to ensure before applying the first change whether the
> corresponding transaction is already prepared and if so then skip the
> changes and just perform the commit prepared. One drawback of this
> approach is that after restart, the prepare flag wouldn't be saved in
> the memory and we end up sending the entire transaction again. One way
> to avoid this overhead is that the publisher before sending the entire
> transaction checks with subscriber whether it has a prepared
> transaction corresponding to the current commit. I understand that
> this is not a good idea even if it works but I don't have any better
> ideas. What do you think?

I considered but not sure it is good to add such mechanism. Your idea requires
additional wait-loop, which might lead bugs and unexpected behavior. And it may
degrade the performance based on the network environment.
As for the another solution (worker sends a list of prepared transactions), it
is also not so good because list of prepared transactions may be huge.

Based on above, I think we can reject the case for now.

FYI - We also considered the idea which walsender waits until all prepared transactions
are resolved before decoding and sending changes, but it did not work well
- the restarted walsender sent only COMMIT PREPARED record for transactions which
have been prepared before disabling the subscription. This happened because
1) if the two_phase option of slots is false, the confirmed_flush can be ahead of
   PREPARE record, and
2) after the altering and restarting, start_decoding_at becomes same as
   confirmed_flush and records behind this won't be decoded.

> > 3. We could mandate that the altering of two_phase state only be done after
> disabling the subscription, just like how it is handled for failover option.
> >
> 
> makes sense.

OK, this spec was added.

According to above, I updated the patch with Ajin.
0001 - extends ALTER SUBSCRIPTION statement. A tab-completion was added.
0002 - mandates the subscription has been disabled. Since no need to change 
       AtEOXact_ApplyLauncher(), the change is reverted.
       If no objections, this can be included to 0001.
0003 - checks whether there are transactions prepared by the worker. If found,
       rejects the ALTER SUBSCRIPTION command.
0004 - checks whether there are transactions prepared on publisher. The backend
       connects to the publisher and confirms it. If found, rejects the ALTER
       SUBSCRIPTION command.
0005 - adds TAP test for it.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Catalog domain not-null constraints
Следующее
От: Alexander Lakhin
Дата:
Сообщение: Re: Parallel CREATE INDEX for BRIN indexes