Обсуждение: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
От
Michael Paquier
Дата:
Hi all, When attempting to use multiple times pg_replication_slot_advance on a slot, then the caller gets back directly InvalidXLogRecPtr as result, for example: =# select * from pg_replication_slot_advance('popo', 'FF/0'); slot_name | end_lsn -----------+----------- popo | 0/60021E0 (1 row) =# select * from pg_replication_slot_advance('popo', 'FF/0'); slot_name | end_lsn -----------+--------- popo | 0/0 (1 row) Wouldn't it be more simple to return NULL to mean that the slot could not be moved forward? That would be easier to parse for clients. Please see the attached. Thanks, -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
От
Magnus Hagander
Дата:
On Fri, May 25, 2018 at 7:28 AM, Michael Paquier <michael@paquier.xyz> wrote:
-- Hi all,
When attempting to use multiple times pg_replication_slot_advance on a
slot, then the caller gets back directly InvalidXLogRecPtr as result,
for example:
=# select * from pg_replication_slot_advance('popo', 'FF/0');
slot_name | end_lsn
-----------+-----------
popo | 0/60021E0
(1 row)
=# select * from pg_replication_slot_advance('popo', 'FF/0');
slot_name | end_lsn
-----------+---------
popo | 0/0
(1 row)
Wouldn't it be more simple to return NULL to mean that the slot could
not be moved forward? That would be easier to parse for clients.
Please see the attached.
I agree that returning 0/0 on this is wrong.
However, can this actually occour for any case other than exactly the case of "moving the position to where the position already is"? If I look at the physical slot path at least that seems to eb the only case, and in that case I think the correct thing to return would be the new position, and not NULL. If we actually *fail* to move the position, we give an error.
Actually, isn't there also a race there? That is, if we try to move it, we check that we're not trying to move it backwards, and throw an error, but that's checked outside the lock. Then later we actually move it, and check *again* if we try to move it backwards, but if that one fails we return InvalidXLogRecPtr (which can happen in the case of concurrent activity on the slot, I think)? In this case, maybe we should just re-check that and raise an error appropriately?
(I haven't looked at the logical slot path, but I assume it would have something similar in it)
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
От
Simon Riggs
Дата:
On 25 May 2018 at 13:12, Magnus Hagander <magnus@hagander.net> wrote: > > > On Fri, May 25, 2018 at 7:28 AM, Michael Paquier <michael@paquier.xyz> > wrote: >> >> Hi all, >> >> When attempting to use multiple times pg_replication_slot_advance on a >> slot, then the caller gets back directly InvalidXLogRecPtr as result, >> for example: >> =# select * from pg_replication_slot_advance('popo', 'FF/0'); >> slot_name | end_lsn >> -----------+----------- >> popo | 0/60021E0 >> (1 row) >> =# select * from pg_replication_slot_advance('popo', 'FF/0'); >> slot_name | end_lsn >> -----------+--------- >> popo | 0/0 >> (1 row) >> >> Wouldn't it be more simple to return NULL to mean that the slot could >> not be moved forward? That would be easier to parse for clients. >> Please see the attached. > > > I agree that returning 0/0 on this is wrong. Agreed > However, can this actually occour for any case other than exactly the case > of "moving the position to where the position already is"? If I look at the > physical slot path at least that seems to eb the only case, and in that case > I think the correct thing to return would be the new position, and not NULL. Docs say "Returns name of the slot and real position to which it was advanced to." so agreed > If we actually *fail* to move the position, we give an error. > > Actually, isn't there also a race there? That is, if we try to move it, we > check that we're not trying to move it backwards, and throw an error, but > that's checked outside the lock. Then later we actually move it, and check > *again* if we try to move it backwards, but if that one fails we return > InvalidXLogRecPtr (which can happen in the case of concurrent activity on > the slot, I think)? In this case, maybe we should just re-check that and > raise an error appropriately? Agreed > (I haven't looked at the logical slot path, but I assume it would have > something similar in it) -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Fri, May 25, 2018 at 02:12:32PM +0200, Magnus Hagander wrote: > I agree that returning 0/0 on this is wrong. > > However, can this actually occour for any case other than exactly the case > of "moving the position to where the position already is"? If I look at the > physical slot path at least that seems to be the only case, and in that > case I think the correct thing to return would be the new position, and not > NULL. If we actually *fail* to move the position, we give an error. Yes, this only returns InvalidXLogRecPtr if the location could not be moved forward. Still, there is more going on here. For a physical slot, confirmed_lsn is always 0/0, hence the backward check is never applied for it. What I think should be used for value assigned to startlsn is restart_lsn instead. Then what do we do if the position cannot be moved: should we raise an error, as what my patch attached does, or just report the existing position the slot is at? A second error that I can see is in pg_logical_replication_slot_advance, which does not take the mutex lock of MyReplicationSlot, so concurrent callers like those of ReplicationSlotsComputeRequiredLSN (applies to physical slot actually) and pg_get_replication_slots() may read false data. On top of that, it seems to me that StartLogicalReplication is reading a couple of fields from a slot without taking a lock, so at least pg_get_replication_slots() may read incorrect data. ReplicationSlotReserveWal also is missing a lock.. Those are older than v11 though. > Actually, isn't there also a race there? That is, if we try to move it, we > check that we're not trying to move it backwards, and throw an error, but > that's checked outside the lock. Then later we actually move it, and check > *again* if we try to move it backwards, but if that one fails we return > InvalidXLogRecPtr (which can happen in the case of concurrent activity on > the slot, I think)? In this case, maybe we should just re-check that and > raise an error appropriately? Er, isn't the take here that ReplicationSlotAcquire is used to lock any replication slot update to happen from other backends? It seems to me that what counts at the end if if a backend PID is associated to a slot in memory. If you look at the code paths updating a logical or physical slot then those imply that the slot is owned, still a spin lock needs to be taken for concurrent readers. > (I haven't looked at the logical slot path, but I assume it would have > something similar in it) Found one. All the things I have spotted are in the patch attached. -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Mon, May 28, 2018 at 05:57:47PM +0900, Michael Paquier wrote: > Found one. All the things I have spotted are in the patch attached. Oops, forgot a ReplicationSlotRelease call. -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
От
Simon Riggs
Дата:
On 28 May 2018 at 09:57, Michael Paquier <michael@paquier.xyz> wrote: > On Fri, May 25, 2018 at 02:12:32PM +0200, Magnus Hagander wrote: >> I agree that returning 0/0 on this is wrong. >> >> However, can this actually occour for any case other than exactly the case >> of "moving the position to where the position already is"? If I look at the >> physical slot path at least that seems to be the only case, and in that >> case I think the correct thing to return would be the new position, and not >> NULL. If we actually *fail* to move the position, we give an error. > > Yes, this only returns InvalidXLogRecPtr if the location could not be > moved forward. Still, there is more going on here. For a physical > slot, confirmed_lsn is always 0/0, hence the backward check is never > applied for it. What I think should be used for value assigned to > startlsn is restart_lsn instead. Then what do we do if the position > cannot be moved: should we raise an error, as what my patch attached > does, or just report the existing position the slot is at? I don't see why an ERROR would be appropriate. > A second error that I can see is in pg_logical_replication_slot_advance, > which does not take the mutex lock of MyReplicationSlot, so concurrent > callers like those of ReplicationSlotsComputeRequiredLSN (applies to > physical slot actually) and pg_get_replication_slots() may read false > data. > > On top of that, it seems to me that StartLogicalReplication is reading a > couple of fields from a slot without taking a lock, so at least > pg_get_replication_slots() may read incorrect data. > ReplicationSlotReserveWal also is missing a lock.. Those are older than > v11 though. > >> Actually, isn't there also a race there? That is, if we try to move it, we >> check that we're not trying to move it backwards, and throw an error, but >> that's checked outside the lock. Then later we actually move it, and check >> *again* if we try to move it backwards, but if that one fails we return >> InvalidXLogRecPtr (which can happen in the case of concurrent activity on >> the slot, I think)? In this case, maybe we should just re-check that and >> raise an error appropriately? > > Er, isn't the take here that ReplicationSlotAcquire is used to lock any > replication slot update to happen from other backends? It seems to me > that what counts at the end if if a backend PID is associated to a slot > in memory. If you look at the code paths updating a logical or physical > slot then those imply that the slot is owned, still a spin lock needs to > be taken for concurrent readers. > >> (I haven't looked at the logical slot path, but I assume it would have >> something similar in it) > > Found one. All the things I have spotted are in the patch attached. I think the problem here is there are no comments explaining how to access the various fields in the structure, so there was no way to check whether the code was good or not. If we add corrective code we should clarify that in comments the .h file also, as is done in XlogCtl Your points look correct to me, well spotted. I'd like to separate the correction of these issues from the change of behavior patch. Those earlier issues can be backpatched, but the change of behavior only affects PG11. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Thu, May 31, 2018 at 06:31:24PM +0100, Simon Riggs wrote: Thanks for the input, Simon. I have been able to spend more time monitoring the slot-related code. > On 28 May 2018 at 09:57, Michael Paquier <michael@paquier.xyz> wrote: >> Yes, this only returns InvalidXLogRecPtr if the location could not be >> moved forward. Still, there is more going on here. For a physical >> slot, confirmed_lsn is always 0/0, hence the backward check is never >> applied for it. What I think should be used for value assigned to >> startlsn is restart_lsn instead. Then what do we do if the position >> cannot be moved: should we raise an error, as what my patch attached >> does, or just report the existing position the slot is at? > > I don't see why an ERROR would be appropriate. Okay, the caller can always compare if the returned position matches what happened in the past or not, so that's fine for me. About that, the LSN used as the initialization should then be startlsn instead of InvalidXLogRecPtr. >> A second error that I can see is in pg_logical_replication_slot_advance, >> which does not take the mutex lock of MyReplicationSlot, so concurrent >> callers like those of ReplicationSlotsComputeRequiredLSN (applies to >> physical slot actually) and pg_get_replication_slots() may read false >> data. >> >> On top of that, it seems to me that StartLogicalReplication is reading a >> couple of fields from a slot without taking a lock, so at least >> pg_get_replication_slots() may read incorrect data. >> ReplicationSlotReserveWal also is missing a lock.. Those are older than >> v11 though. Actually, there are two extra problems: - In CreateInitDecodingContext which can be called after the slot is marked as in use so there could be inconsistencies with pg_get_replication_slots() as well for catalog_xmin & co. - In DecodingContextFindStartpoint where confirmed_lsn is updated without the mutex taken. > I think the problem here is there are no comments explaining how to > access the various fields in the structure, so there was no way to > check whether the code was good or not. > > If we add corrective code we should clarify that in comments the .h > file also, as is done in XlogCtl Yes, I agree with you. There are at least two LWLocks used for the overall slot creation and handling, as well as a set of spin locks used for the fields. The code also does not mention in its comments that slots marked with in_use = false are not scanned at all by concurrent backends, but this is strongly implied in the code, hence you don't need to care about taking lock for them when working on fields for this slot as long as its flag in_use has not been marked to true. There is one code path which bypasses slots with in_use marked to true, but that's for the startup process recovering the slot data, so there is no need to care about locking in this case. > Your points look correct to me, well spotted. I'd like to separate the > correction of these issues from the change of behavior patch. Those > earlier issues can be backpatched, but the change of behavior only > affects PG11. Definitely, while the previous patch was around mainly to show where things are incorrect, I am attaching a set of patches for HEAD which can be used for commit: - One patch which addresses the several lock problems and adds some instructions about the variables protected by spinlocks for slots in use, which should be back-patched. - A second patch for HEAD which addresses what has been noticed for the new slot advance feature. This addresses as well the lock problems introduced in the new advance code, hopefully the split makes sense to others on this thread. Once again those only apply to HEAD, please feel free to ping me if you would like versions for back-branches (or anybody picking up those patches). Thanks, -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Fri, Jun 01, 2018 at 02:53:09PM -0400, Michael Paquier wrote: > Definitely, while the previous patch was around mainly to show where > things are incorrect, I am attaching a set of patches for HEAD which can > be used for commit: > - One patch which addresses the several lock problems and adds some > instructions about the variables protected by spinlocks for slots in > use, which should be back-patched. > - A second patch for HEAD which addresses what has been noticed for the > new slot advance feature. This addresses as well the lock problems > introduced in the new advance code, hopefully the split makes sense to > others on this thread. > Once again those only apply to HEAD, please feel free to ping me if you > would like versions for back-branches (or anybody picking up those > patches). And of course I found a typo just after sending those.. Please use the attached ones instead. -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
От
Petr Jelinek
Дата:
Hi, On 01/06/18 21:13, Michael Paquier wrote: > - startlsn = MyReplicationSlot->data.confirmed_flush; > + if (OidIsValid(MyReplicationSlot->data.database)) > + startlsn = MyReplicationSlot->data.confirmed_flush; > + else > + startlsn = MyReplicationSlot->data.restart_lsn; > + > if (moveto < startlsn) > { > ReplicationSlotRelease(); This part looks correct for the checking that we are not moving backwards. However, there is another existing issue with this code which is that we are later using the confirmed_flush (via startlsn) as start point of logical decoding (XLogReadRecord parameter in pg_logical_replication_slot_advance) which is not correct. The restart_lsn should be used for that. I think it would make sense to fix that as part of this patch as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Mon, Jun 04, 2018 at 11:51:35AM +0200, Petr Jelinek wrote: > On 01/06/18 21:13, Michael Paquier wrote: >> - startlsn = MyReplicationSlot->data.confirmed_flush; >> + if (OidIsValid(MyReplicationSlot->data.database)) >> + startlsn = MyReplicationSlot->data.confirmed_flush; >> + else >> + startlsn = MyReplicationSlot->data.restart_lsn; >> + >> if (moveto < startlsn) >> { >> ReplicationSlotRelease(); > > This part looks correct for the checking that we are not moving > backwards. However, there is another existing issue with this code which > is that we are later using the confirmed_flush (via startlsn) as start > point of logical decoding (XLogReadRecord parameter in > pg_logical_replication_slot_advance) which is not correct. The > restart_lsn should be used for that. I think it would make sense to fix > that as part of this patch as well. I am not sure I understand what you are coming at here. Could you explain your point a bit more please as 9c7d06d is yours? When creating the decoding context, all other code paths use the confirmed LSN as a start point if not explicitely defined by the caller of CreateDecodingContext, as it points to the last LSN where a transaction has been committed and decoded. The backward check is also correct to me, for which I propose to add a comment block like that: + /* + * Check if the slot is not moving backwards. Physical slots rely + * simply on restart_lsn as a minimum point, while logical slots + * have confirmed consumption up to confirmed_lsn, meaning that + * in both cases data older than that is not available anymore. + */ + if (OidIsValid(MyReplicationSlot->data.database)) + minlsn = MyReplicationSlot->data.confirmed_flush; + else + minlsn = MyReplicationSlot->data.restart_lsn; Any tests I do are showing me that using confirmed_lsn would not matter much? as we want the slot's consumer to still decode transactions whose commits happened after the point where the slot has been advanced to. So let's make sure that we are on the same page for the starting LSN used. On top of that, the locking issues in CreateInitDecodingContext() and DecodingContextFindStartpoint go back to 9.4. So I would be inclined to get 0001 applied first as a bug fix on all branches, still that's a minor issue so there could be arguments for just doing it on HEAD. I am as well fully open to suggestions for the extra comments which document the use of ReplicationSlotControlLock and mutex for in-memory slot data. Any thoughts about those two last points? -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Mon, Jun 04, 2018 at 11:51:35AM +0200, Petr Jelinek wrote: > On 01/06/18 21:13, Michael Paquier wrote: >> - startlsn =3D MyReplicationSlot->data.confirmed_flush; >> + if (OidIsValid(MyReplicationSlot->data.database)) >> + startlsn =3D MyReplicationSlot->data.confirmed_flush; >> + else >> + startlsn =3D MyReplicationSlot->data.restart_lsn; >> + >> if (moveto < startlsn) >> { >> ReplicationSlotRelease(); > > This part looks correct for the checking that we are not moving > backwards. However, there is another existing issue with this code > which > is that we are later using the confirmed_flush (via startlsn) as start > point of logical decoding (XLogReadRecord parameter in > pg_logical_replication_slot_advance) which is not correct. The > restart_lsn should be used for that. I think it would make sense to > fix > that as part of this patch as well. I am not sure I understand what you are coming at here. Could you explain your point a bit more please as 9c7d06d is yours? When creating the decoding context, all other code paths use the confirmed LSN as a start point if not explicitely defined by the caller of CreateDecodingContext, as it points to the last LSN where a transaction has been committed and decoded. The backward check is also correct to me, for which I propose to add a comment block like that: + /* + * Check if the slot is not moving backwards. Physical slots rely + * simply on restart_lsn as a minimum point, while logical slots + * have confirmed consumption up to confirmed_lsn, meaning that + * in both cases data older than that is not available anymore. + */ + if (OidIsValid(MyReplicationSlot->data.database)) + minlsn = MyReplicationSlot->data.confirmed_flush; + else + minlsn = MyReplicationSlot->data.restart_lsn; Any tests I do are showing me that using confirmed_lsn would not matter much? as we want the slot's consumer to still decode transactions whose commits happened after the point where the slot has been advanced to. So let's make sure that we are on the same page for the starting LSN used. On top of that, the locking issues in CreateInitDecodingContext() and DecodingContextFindStartpoint go back to 9.4. So I would be inclined to get 0001 applied first as a bug fix on all branches, still that's a minor issue so there could be arguments for just doing it on HEAD. I am as well fully open to suggestions for the extra comments which document the use of ReplicationSlotControlLock and mutex for in-memory slot data. Any thoughts about those two last points? -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
От
Petr Jelinek
Дата:
On 05/06/18 06:28, Michael Paquier wrote: > On Mon, Jun 04, 2018 at 11:51:35AM +0200, Petr Jelinek wrote: >> On 01/06/18 21:13, Michael Paquier wrote: >>> - startlsn =3D MyReplicationSlot->data.confirmed_flush; >>> + if (OidIsValid(MyReplicationSlot->data.database)) >>> + startlsn =3D MyReplicationSlot->data.confirmed_flush; >>> + else >>> + startlsn =3D MyReplicationSlot->data.restart_lsn; >>> + >>> if (moveto < startlsn) >>> { >>> ReplicationSlotRelease(); >> >> This part looks correct for the checking that we are not moving >> backwards. However, there is another existing issue with this code >> which >> is that we are later using the confirmed_flush (via startlsn) as start >> point of logical decoding (XLogReadRecord parameter in >> pg_logical_replication_slot_advance) which is not correct. The >> restart_lsn should be used for that. I think it would make sense to >> fix >> that as part of this patch as well. > > I am not sure I understand what you are coming at here. Could you > explain your point a bit more please as 9c7d06d is yours? As I said, it's an existing issue. I just had chance to reread the code when looking and your patch. > When creating > the decoding context, all other code paths use the confirmed LSN as a > start point if not explicitely defined by the caller of > CreateDecodingContext, as it points to the last LSN where a transaction > has been committed and decoded. I didn't say anything about CreateDecodingContext though. I am talking about the fact that we then use the same variable as input to XLogReadRecord later in the logical slot code path. The whole point of having restart_lsn for logical slot is to have correct start point for XLogReadRecord so using the confirmed_flush there is wrong (and has been wrong since the original commit). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Tue, Jun 05, 2018 at 01:00:30PM +0200, Petr Jelinek wrote: > I didn't say anything about CreateDecodingContext though. I am talking > about the fact that we then use the same variable as input to > XLogReadRecord later in the logical slot code path. The whole point of > having restart_lsn for logical slot is to have correct start point for > XLogReadRecord so using the confirmed_flush there is wrong (and has been > wrong since the original commit). OK, I finally see the point you are coming at after a couple of hours brainstorming about it, and indeed using confirmed_lsn is logically incorrect so the current code is inconsistent with what DecodingContextFindStartpoint() does, so we rely on restart_lsn to scan for all the records and the decoder state is here to make sure that the slot's confirmed_lsn is updated to a consistent position. I have added your feedback in the attached (indented and such), which results in some simplifications with a couple of routines. I am attaching as well the patch I sent yesterday. 0001 is candidate for a back-patch, 0002 is for HEAD to fix the slot advance stuff. There is another open item related to slot advancing here: https://www.postgresql.org/message-id/2840048a-1184-417a-9da8-3299d207a1d7%40postgrespro.ru And per my tests the patch is solving this item as well. I have just used the test mentioned in the thread which: 1) creates a slot 2) does some activity to generate a couple of WAL pages 3) advances the slot at page boundary 4) Moves again the slot. This test crashes on HEAD at step 4, and not with the attached. What do you think? -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
От
Petr Jelinek
Дата:
On 06/06/18 04:04, Michael Paquier wrote: > On Tue, Jun 05, 2018 at 01:00:30PM +0200, Petr Jelinek wrote: >> I didn't say anything about CreateDecodingContext though. I am talking >> about the fact that we then use the same variable as input to >> XLogReadRecord later in the logical slot code path. The whole point of >> having restart_lsn for logical slot is to have correct start point for >> XLogReadRecord so using the confirmed_flush there is wrong (and has been >> wrong since the original commit). > > OK, I finally see the point you are coming at after a couple of hours > brainstorming about it, and indeed using confirmed_lsn is logically > incorrect so the current code is inconsistent with what > DecodingContextFindStartpoint() does, so we rely on restart_lsn to scan > for all the records and the decoder state is here to make sure that the > slot's confirmed_lsn is updated to a consistent position. I have added > your feedback in the attached (indented and such), which results in some > simplifications with a couple of routines. > > I am attaching as well the patch I sent yesterday. 0001 is candidate > for a back-patch, 0002 is for HEAD to fix the slot advance stuff. > > There is another open item related to slot advancing here: > https://www.postgresql.org/message-id/2840048a-1184-417a-9da8-3299d207a1d7%40postgrespro.ru > And per my tests the patch is solving this item as well. I have just > used the test mentioned in the thread which: > 1) creates a slot > 2) does some activity to generate a couple of WAL pages > 3) advances the slot at page boundary > 4) Moves again the slot. > This test crashes on HEAD at step 4, and not with the attached. > > What do you think? Seems reasonable to me. I think the only thing to note about the patches from my side is that we probably don't want to default to restart_lsn for the pg_logical_replication_slot_advance() return value (when nothing was done) but rather the confirmed_lsn. As it is in current patch if we call the function repeatedly and one call moved slot forward but the next one didn't the return value will go backwards as restart_lsn tends to be behind the confirmed one. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Wed, Jun 06, 2018 at 04:57:22PM +0200, Petr Jelinek wrote: > I think the only thing to note about the patches from my side is that we > probably don't want to default to restart_lsn for the > pg_logical_replication_slot_advance() return value (when nothing was > done) but rather the confirmed_lsn. As it is in current patch if we call > the function repeatedly and one call moved slot forward but the next one > didn't the return value will go backwards as restart_lsn tends to be > behind the confirmed one. It does not matter much as the PG_TRY loop would still enforce the result to confirmed_lsn anyway if nothing happens, still let's do as you suggest as that's more consistent. -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Wed, Jun 06, 2018 at 11:04:39AM +0900, Michael Paquier wrote: > I am attaching as well the patch I sent yesterday. 0001 is candidate > for a back-patch, 0002 is for HEAD to fix the slot advance stuff. I have been able to look again at 0001 and pushed it as 9e149c8. As reading inconsistent data from replication slots is rather hard to trigger, I have just pushed the patch to HEAD. I'll look at 0002 tomorrow. -- Michael
Вложения
Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От
Michael Paquier
Дата:
On Sun, Jun 10, 2018 at 10:19:23PM +0900, Michael Paquier wrote: > I have been able to look again at 0001 and pushed it as 9e149c8. As > reading inconsistent data from replication slots is rather hard to > trigger, I have just pushed the patch to HEAD. I'll look at 0002 > tomorrow. And pushed 0002 as f731cfa9, so we should be good with this open item. -- Michael