Обсуждение: Logical walsenders don't process XLOG_CHECKPOINT_SHUTDOWN
Currently, we don't perform $SUBJECT at the time of shutdown of the server. I think currently it will only have a minor impact that after restart subscribers will ask to start processing before the XLOG_CHECKPOINT_SHUTDOWN or maybe after the switchover the old publisher will have an extra WAL record. However, if we want to support the upgrade of the publisher node such that the existing slots are copied/created into a new cluster, we need to ensure that all the changes generated on the publisher must be sent and applied to the subscriber. This is a hard requirement because after the upgrade we reset the WAL and if some of the WAL has not been sent then that will be lost. Now, even a clean shutdown of the publisher node can't ensure that all the WAL has been sent because it is quite possible that the subscriber node is down due to which at shutdown time walsenders won't be available to send the data. Similarly, there could be some logical slots created via backend which may not have processed all the data and we can't copy those slots as it is during the upgrade. To ensure that all the data has been sent during the upgrade, we can ensure that each logical slot's confirmed_flush_lsn (position in the WAL till which subscriber has confirmed that it has applied the WAL) is the same as current_wal_insert_lsn. Now, because we don't send XLOG_CHECKPOINT_SHUTDOWN even on clean shutdown, confirmed_flush_lsn will never be the same as current_wal_insert_lsn. The one idea being discussed in patch [1] (see 0003) is to ensure that each slot's LSN is exactly XLOG_CHECKPOINT_SHUTDOWN ago which probably has some drawbacks like what if we tomorrow add some other WAL in the shutdown checkpoint path or the size of record changes then we would need to modify the corresponding code in upgrade. The other possibility is that we allow logical walsenders to process XLOG_CHECKPOINT_SHUTDOWN before shutdown after which during the upgrade confirmed_flush_lsn will be the same as current_wal_insert_lsn. AFAICU, the primary reason that we don't allow it is that we want to avoid writing any new WAL after the shutdown checkpoint (to avoid any sort of PANIC as discussed in the thread [2]) which is possible during decoding due to hint bits but it doesn't seem decoding of XLOG_CHECKPOINT_SHUTDOWN can lead to any hint bit updates. It seems we made these changes as part of commit c6c3334364 [3]. Note that even if we can ensure that walsenders send all the WAL before shutdown and make corresponding logical slots up-to-date so that there is no pending data but it would still be possible that logical slots created manually via backends won't consume all the WAL before shutdown. I think those will be the responsibility of users as those are created by them. We can also provide some guidelines to users similar to what we have on physical standby in pg_upgrade docs [4] (See: 9 Prepare for standby server upgrades). Something like, before upgrading, verify that the subscriber is caught up with the publisher by comparing the current WAL position on the publisher and pg_stat_subscription.received_lsn on the subscriber. Any better ideas or thoughts on the above? [1] - https://www.postgresql.org/message-id/TYAPR01MB586619721863B7FFDAC4369FF550A%40TYAPR01MB5866.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zVmYgfBk%3DLjsk-UL9cEf-eA%40mail.gmail.com [3] - commit c6c333436491a292d56044ed6e167e2bdee015a2 Author: Andres Freund <andres@anarazel.de> Date: Mon Jun 5 18:53:41 2017 -0700 Prevent possibility of panics during shutdown checkpoint. [4] - https://www.postgresql.org/docs/devel/pgupgrade.html -- With Regards, Amit Kapila.
Hi, On 2023-07-25 14:31:00 +0530, Amit Kapila wrote: > To ensure that all the data has been sent during the upgrade, we can > ensure that each logical slot's confirmed_flush_lsn (position in the > WAL till which subscriber has confirmed that it has applied the WAL) > is the same as current_wal_insert_lsn. Now, because we don't send > XLOG_CHECKPOINT_SHUTDOWN even on clean shutdown, confirmed_flush_lsn > will never be the same as current_wal_insert_lsn. The one idea being > discussed in patch [1] (see 0003) is to ensure that each slot's LSN is > exactly XLOG_CHECKPOINT_SHUTDOWN ago which probably has some drawbacks > like what if we tomorrow add some other WAL in the shutdown checkpoint > path or the size of record changes then we would need to modify the > corresponding code in upgrade. Yea, that doesn't seem like a good path. But there is a variant that seems better: We could just scan the end of the WAL for records that should have been streamed out? Greetings, Andres Freund
On Tue, Jul 25, 2023 at 10:33 PM Andres Freund <andres@anarazel.de> wrote: > > On 2023-07-25 14:31:00 +0530, Amit Kapila wrote: > > To ensure that all the data has been sent during the upgrade, we can > > ensure that each logical slot's confirmed_flush_lsn (position in the > > WAL till which subscriber has confirmed that it has applied the WAL) > > is the same as current_wal_insert_lsn. Now, because we don't send > > XLOG_CHECKPOINT_SHUTDOWN even on clean shutdown, confirmed_flush_lsn > > will never be the same as current_wal_insert_lsn. The one idea being > > discussed in patch [1] (see 0003) is to ensure that each slot's LSN is > > exactly XLOG_CHECKPOINT_SHUTDOWN ago which probably has some drawbacks > > like what if we tomorrow add some other WAL in the shutdown checkpoint > > path or the size of record changes then we would need to modify the > > corresponding code in upgrade. > > Yea, that doesn't seem like a good path. But there is a variant that seems > better: We could just scan the end of the WAL for records that should have > been streamed out? > This sounds like a better idea. So, one way to realize this is that group slots based on confirmed_flush_lsn and then scan based on that. Once we ensure that the slot group with the highest confirm_flush_location is up-to-date (doesn't have any pending WAL except for shutdown_checkpoint), any slot group having a lesser value of confirm_flush_location would be considered a group with pending data. BTW, I think the main downside for not trying to send XLOG_CHECKPOINT_SHUTDOWN for logical walsenders is that even if today there is no risk of any hint bit updates (or any other possibility of generating WAL) during decoding of XLOG_CHECKPOINT_SHUTDOWN but there is no future guarantee of the same. Is there anything I am missing here? -- With Regards, Amit Kapila.