Обсуждение: Forbid the use of invalidated physical slots in streaming replication.
Forbid the use of invalidated physical slots in streaming replication.
От
"Zhijie Hou (Fujitsu)"
Дата:
Hi, When testing streaming replication with a physical slot. I found an unexpected behavior that the walsender could use an invalidated physical slot for streaming. This occurs when the primary slot is invalidated due to reaching the max_slot_wal_keep_size before initializing the streaming replication (e.g. before creating the standby). Attach a draft script(test_invalidated_slot.sh) which can reproduce this. Once the slot is invalidated, it can no longer protect the WALs and Rows, as these invalidated slots are not considered in functions like ReplicationSlotsComputeRequiredXmin(). Besides, the walsender could advance the restart_lsn of an invalidated slot, then user won't be able to know that if the slot is actually validated or not, because the 'conflicting' of view pg_replication_slot could be set back to null. So, I think it's a bug and one idea to fix is to check the validity of the physical slot in StartReplication() after acquiring the slot like what the attachment does, what do you think ? Best Regards, Hou Zhijie
Вложения
On Wed, Dec 6, 2023 at 6:25 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > Hi, > > When testing streaming replication with a physical slot. I found an unexpected > behavior that the walsender could use an invalidated physical slot for > streaming. > > This occurs when the primary slot is invalidated due to reaching the > max_slot_wal_keep_size before initializing the streaming replication > (e.g. before creating the standby). Attach a draft script(test_invalidated_slot.sh) > which can reproduce this. Interesting. Thanks for the script. It reproduces the problem for me easily. > > Once the slot is invalidated, it can no longer protect the WALs and > Rows, as these invalidated slots are not considered in functions like > ReplicationSlotsComputeRequiredXmin(). > > Besides, the walsender could advance the restart_lsn of an invalidated slot, > then user won't be able to know that if the slot is actually validated or not, > because the 'conflicting' of view pg_replication_slot could be set back to > null. In this case, since the basebackup was taken after the slot was invalidated, it does not require the WAL that was removed. But it seems that once the streaming starts, the slot sprints to life again and gets validated again. Here's pg_replication_slot output after the standby starts #select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase | conflic ting -------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------+-------- ----- logicalslot | test_decoding | logical | 5 | postgres | f | f | | | 739 | | 0/1513B08 | lost | | f | t physical | | physical | | | f | t | 341925 | 752 | | 0/404CB78 | | unreserved | 16462984 | f | (2 rows) which looks quite similar to the output when slot was valid after creation slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase | conflic ting -------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------+-------- ----- logicalslot | test_decoding | logical | 5 | postgres | f | f | | | 739 | 0/1513AD0 | 0/1513B08 | unreserved | -1591888 | f | f physical | | physical | | | f | f | | | | 0/14F0DF0 | | unreserved | -1591888 | f | (2 rows) > > So, I think it's a bug and one idea to fix is to check the validity of the physical slot in > StartReplication() after acquiring the slot like what the attachment does, > what do you think ? I am not sure whether that's necessarily a bug. Of course, we don't expect invalid slots to be used but in this case I don't see any harm. The invalid slot has been revived and has all the properties set just like a valid slot. So it must be considered in ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself though. In case the WAL is really lost and is requested by the standby it will throw an error "requested WAL segment [0-9A-F]+ has already been removed". So no harm there as well. I haven't been able to locate the code which makes the slot valid though. So I can't say whether the behaviour is intended or not. Looking at StartReplication() comment /* * We don't need to verify the slot's restart_lsn here; instead we * rely on the caller requesting the starting point to use. If the * WAL segment doesn't exist, we'll fail later. */ it looks like the behaviour is not completely unexpected. -- Best Wishes, Ashutosh Bapat
RE: Forbid the use of invalidated physical slots in streaming replication.
От
"Zhijie Hou (Fujitsu)"
Дата:
On Thursday, December 7, 2023 7:43 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: Hi, > > On Wed, Dec 6, 2023 at 6:25 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > wrote: > > > > When testing streaming replication with a physical slot. I found an > > unexpected behavior that the walsender could use an invalidated > > physical slot for streaming. > > > > This occurs when the primary slot is invalidated due to reaching the > > max_slot_wal_keep_size before initializing the streaming replication > > (e.g. before creating the standby). Attach a draft > > script(test_invalidated_slot.sh) which can reproduce this. > > Interesting. Thanks for the script. It reproduces the problem for me easily. Thanks for testing and replying! > > > > > Once the slot is invalidated, it can no longer protect the WALs and > > Rows, as these invalidated slots are not considered in functions like > > ReplicationSlotsComputeRequiredXmin(). > > > > Besides, the walsender could advance the restart_lsn of an invalidated > > slot, then user won't be able to know that if the slot is actually > > validated or not, because the 'conflicting' of view > > pg_replication_slot could be set back to null. > > In this case, since the basebackup was taken after the slot was invalidated, it > does not require the WAL that was removed. But it seems that once the > streaming starts, the slot sprints to life again and gets validated again. Here's > pg_replication_slot output after the standby starts. Actually, It doesn't bring the invalidated slot back to life completely. The slot's view data looks valid while the 'invalidated' flag of this slot is still RS_INVAL_WAL_REMOVED (user are not aware of it.) > > > > > So, I think it's a bug and one idea to fix is to check the validity of > > the physical slot in > > StartReplication() after acquiring the slot like what the attachment > > does, what do you think ? > > I am not sure whether that's necessarily a bug. Of course, we don't expect > invalid slots to be used but in this case I don't see any harm. > The invalid slot has been revived and has all the properties set just like a valid > slot. So it must be considered in > ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself > though. In case the WAL is really lost and is requested by the standby it will > throw an error "requested WAL segment [0-9A-F]+ has already been > removed". So no harm there as well. Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even if the walsender advances the restart_lsn, the slot will not be considered in the ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe and that's why I think it's a bug. After looking closer, it seems this behavior started from 15f8203 which introduced the ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum in Xmin/Lsn computation function. If we want to go back to previous behavior, we need to revert/adjust the check for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical decoding(walsender) disallow using invalidated slot, so I feel it's consistent to do similar check for physical one. Besides, pg_replication_slot_advance() also disallow passing invalidated slot to it as well. What do you think ? Best Regards, Hou zj
> > > pg_replication_slot could be set back to null. > > > > In this case, since the basebackup was taken after the slot was invalidated, it > > does not require the WAL that was removed. But it seems that once the > > streaming starts, the slot sprints to life again and gets validated again. Here's > > pg_replication_slot output after the standby starts. > > Actually, It doesn't bring the invalidated slot back to life completely. > The slot's view data looks valid while the 'invalidated' flag of this slot is still > RS_INVAL_WAL_REMOVED (user are not aware of it.) > I was mislead by the code in pg_get_replication_slots(). I did not read it till the following --- code ---- case WALAVAIL_REMOVED: /* * If we read the restart_lsn long enough ago, maybe that file * has been removed by now. However, the walsender could have * moved forward enough that it jumped to another file after * we looked. If checkpointer signalled the process to * termination, then it's definitely lost; but if a process is * still alive, then "unreserved" seems more appropriate. * * If we do change it, save the state for safe_wal_size below. */ --- code --- I see now how an invalid slot's wal status can be reported as unreserved. So I think it's a bug. > > > > > > > > > So, I think it's a bug and one idea to fix is to check the validity of > > > the physical slot in > > > StartReplication() after acquiring the slot like what the attachment > > > does, what do you think ? > > > > I am not sure whether that's necessarily a bug. Of course, we don't expect > > invalid slots to be used but in this case I don't see any harm. > > The invalid slot has been revived and has all the properties set just like a valid > > slot. So it must be considered in > > ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself > > though. In case the WAL is really lost and is requested by the standby it will > > throw an error "requested WAL segment [0-9A-F]+ has already been > > removed". So no harm there as well. > > Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even > if the walsender advances the restart_lsn, the slot will not be considered in the > ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe > and that's why I think it's a bug. > > After looking closer, it seems this behavior started from 15f8203 which introduced the > ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum > in Xmin/Lsn computation function. > > If we want to go back to previous behavior, we need to revert/adjust the check > for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical > decoding(walsender) disallow using invalidated slot, so I feel it's consistent > to do similar check for physical one. Besides, pg_replication_slot_advance() > also disallow passing invalidated slot to it as well. What do you think ? What happens if you run your script on a build prior to 15f8203? Purely from reading the code, it looks like the physical slot would sprint back to life since its restart_lsn would be updated. But I am not able to see what happens to invalidated_at. It probably remains a valid LSN and the slot would still not be considred in xmin calculation. It will be good to be compatible to pre-15f8203 behaviour. I think making logical and physical slot behaviour consistent would be better but if the inconsistent behaviour is out there for some releases, changing that now will break backward compatibility. -- Best Wishes, Ashutosh Bapat
On Fri, 8 Dec 2023 at 19:15, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > > > > pg_replication_slot could be set back to null. > > > > > > In this case, since the basebackup was taken after the slot was invalidated, it > > > does not require the WAL that was removed. But it seems that once the > > > streaming starts, the slot sprints to life again and gets validated again. Here's > > > pg_replication_slot output after the standby starts. > > > > Actually, It doesn't bring the invalidated slot back to life completely. > > The slot's view data looks valid while the 'invalidated' flag of this slot is still > > RS_INVAL_WAL_REMOVED (user are not aware of it.) > > > > I was mislead by the code in pg_get_replication_slots(). I did not > read it till the following > > --- code ---- > case WALAVAIL_REMOVED: > > /* > * If we read the restart_lsn long enough ago, maybe that file > * has been removed by now. However, the walsender could have > * moved forward enough that it jumped to another file after > * we looked. If checkpointer signalled the process to > * termination, then it's definitely lost; but if a process is > * still alive, then "unreserved" seems more appropriate. > * > * If we do change it, save the state for safe_wal_size below. > */ > --- code --- > > I see now how an invalid slot's wal status can be reported as > unreserved. So I think it's a bug. > > > > > > > > > > > > > > So, I think it's a bug and one idea to fix is to check the validity of > > > > the physical slot in > > > > StartReplication() after acquiring the slot like what the attachment > > > > does, what do you think ? > > > > > > I am not sure whether that's necessarily a bug. Of course, we don't expect > > > invalid slots to be used but in this case I don't see any harm. > > > The invalid slot has been revived and has all the properties set just like a valid > > > slot. So it must be considered in > > > ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself > > > though. In case the WAL is really lost and is requested by the standby it will > > > throw an error "requested WAL segment [0-9A-F]+ has already been > > > removed". So no harm there as well. > > > > Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even > > if the walsender advances the restart_lsn, the slot will not be considered in the > > ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe > > and that's why I think it's a bug. > > > > After looking closer, it seems this behavior started from 15f8203 which introduced the > > ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum > > in Xmin/Lsn computation function. > > > > If we want to go back to previous behavior, we need to revert/adjust the check > > for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical > > decoding(walsender) disallow using invalidated slot, so I feel it's consistent > > to do similar check for physical one. Besides, pg_replication_slot_advance() > > also disallow passing invalidated slot to it as well. What do you think ? > > What happens if you run your script on a build prior to 15f8203? > Purely from reading the code, it looks like the physical slot would > sprint back to life since its restart_lsn would be updated. But I am > not able to see what happens to invalidated_at. It probably remains a > valid LSN and the slot would still not be considred in xmin > calculation. It will be good to be compatible to pre-15f8203 > behaviour. I have changed the patch status to "Waiting on Author", as some of the queries requested by Ashutosh are not yet addressed. Regards, Vignesh
On Thu, Dec 7, 2023 at 8:00 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > After looking closer, it seems this behavior started from 15f8203 which introduced the > ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum > in Xmin/Lsn computation function. Adding Andres in Cc, as that was his commit. It's not entirely clear to me how this feature was intended to interact with physical replication slots. I found seemingly-relevant documentation in two places: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE https://www.postgresql.org/docs/current/view-pg-replication-slots.html In the latter, it says "unreserved means that the slot no longer retains the required WAL files and some of them are to be removed at the next checkpoint. This state can return to reserved or extended." But if a slot becomes invalid in such a way that it cannot return to a valid state later, then this isn't accurate. I have a feeling that the actual behavior here has evolved and the documentation hasn't kept up. And I wonder whether we need a more comprehensive explanation of the intended behavior in this section: https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, 8 Jan 2024 at 10:25, vignesh C <vignesh21@gmail.com> wrote: > > On Fri, 8 Dec 2023 at 19:15, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > pg_replication_slot could be set back to null. > > > > > > > > In this case, since the basebackup was taken after the slot was invalidated, it > > > > does not require the WAL that was removed. But it seems that once the > > > > streaming starts, the slot sprints to life again and gets validated again. Here's > > > > pg_replication_slot output after the standby starts. > > > > > > Actually, It doesn't bring the invalidated slot back to life completely. > > > The slot's view data looks valid while the 'invalidated' flag of this slot is still > > > RS_INVAL_WAL_REMOVED (user are not aware of it.) > > > > > > > I was mislead by the code in pg_get_replication_slots(). I did not > > read it till the following > > > > --- code ---- > > case WALAVAIL_REMOVED: > > > > /* > > * If we read the restart_lsn long enough ago, maybe that file > > * has been removed by now. However, the walsender could have > > * moved forward enough that it jumped to another file after > > * we looked. If checkpointer signalled the process to > > * termination, then it's definitely lost; but if a process is > > * still alive, then "unreserved" seems more appropriate. > > * > > * If we do change it, save the state for safe_wal_size below. > > */ > > --- code --- > > > > I see now how an invalid slot's wal status can be reported as > > unreserved. So I think it's a bug. > > > > > > > > > > > > > > > > > > > So, I think it's a bug and one idea to fix is to check the validity of > > > > > the physical slot in > > > > > StartReplication() after acquiring the slot like what the attachment > > > > > does, what do you think ? > > > > > > > > I am not sure whether that's necessarily a bug. Of course, we don't expect > > > > invalid slots to be used but in this case I don't see any harm. > > > > The invalid slot has been revived and has all the properties set just like a valid > > > > slot. So it must be considered in > > > > ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself > > > > though. In case the WAL is really lost and is requested by the standby it will > > > > throw an error "requested WAL segment [0-9A-F]+ has already been > > > > removed". So no harm there as well. > > > > > > Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even > > > if the walsender advances the restart_lsn, the slot will not be considered in the > > > ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe > > > and that's why I think it's a bug. > > > > > > After looking closer, it seems this behavior started from 15f8203 which introduced the > > > ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum > > > in Xmin/Lsn computation function. > > > > > > If we want to go back to previous behavior, we need to revert/adjust the check > > > for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical > > > decoding(walsender) disallow using invalidated slot, so I feel it's consistent > > > to do similar check for physical one. Besides, pg_replication_slot_advance() > > > also disallow passing invalidated slot to it as well. What do you think ? > > > > What happens if you run your script on a build prior to 15f8203? > > Purely from reading the code, it looks like the physical slot would > > sprint back to life since its restart_lsn would be updated. But I am > > not able to see what happens to invalidated_at. It probably remains a > > valid LSN and the slot would still not be considred in xmin > > calculation. It will be good to be compatible to pre-15f8203 > > behaviour. > > I have changed the patch status to "Waiting on Author", as some of the > queries requested by Ashutosh are not yet addressed. The patch which you submitted has been awaiting your attention for quite some time now. As such, we have moved it to "Returned with Feedback" and removed it from the reviewing queue. Depending on timing, this may be reversible. Kindly address the feedback you have received, and resubmit the patch to the next CommitFest. Regards, Vignesh