Обсуждение: Track in pg_replication_slots the reason why slots conflict?

Поиск
Список
Период
Сортировка

Track in pg_replication_slots the reason why slots conflict?

От
Michael Paquier
Дата:
Hi all,
(Bertrand and Andres in CC.)

While listening at Bertrand's talk about logical decoding on standbys
last week at Prague, I got surprised by the fact that we do not
reflect in the catalogs the reason why a conflict happened for a slot.
There are three of them depending on ReplicationSlotInvalidationCause:
- WAL removed.
- Invalid horizon.
- Insufficient WAL level.

This idea has been hinted around here on the original thread that led
to be87200efd93:
https://www.postgresql.org/message-id/d7547f2c-a0c3-6aad-b631-b7ed5efaf298@gmail.com
However v44 has picked up the idea of a boolean:
https://www.postgresql.org/message-id/bdc49e0b-cd39-bcd3-e391-b0ad6e48b5cf@gmail.com

ReplicationSlotCtl holds this information, so couldn't it be useful
for monitoring purposes to know why a slot got invalidated and add a
column to pg_get_replication_slots()?  This could just be an extra
text conflicting_reason, defaulting to NULL when there's nothing to
see.

Thoughts?
--
Michael

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Thu, Dec 21, 2023 at 5:51 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> While listening at Bertrand's talk about logical decoding on standbys
> last week at Prague, I got surprised by the fact that we do not
> reflect in the catalogs the reason why a conflict happened for a slot.
> There are three of them depending on ReplicationSlotInvalidationCause:
> - WAL removed.
> - Invalid horizon.
> - Insufficient WAL level.
>

The invalidation cause is also required by one of the features being
discussed "Synchronize slots from primary to standby" [1] and there is
already a thread to discuss the same [2]. As that thread started
yesterday only, you may not have noticed it. Currently, the proposal
is to expose it via a function but we can extend it to also display
via view, feel free to share your opinion on that thread.

[1] - https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
[2] -
https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6N%3DanX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: Track in pg_replication_slots the reason why slots conflict?

От
Michael Paquier
Дата:
On Thu, Dec 21, 2023 at 08:20:16AM +0530, Amit Kapila wrote:
> The invalidation cause is also required by one of the features being
> discussed "Synchronize slots from primary to standby" [1] and there is
> already a thread to discuss the same [2]. As that thread started
> yesterday only, you may not have noticed it. Currently, the proposal
> is to expose it via a function but we can extend it to also display
> via view, feel free to share your opinion on that thread.
>
> [1] - https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
> [2] -
https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6N%3DanX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com

Ah thanks, missed this one.  This cannot use a separate function,
actually, and there is a good reason for that that has not been
mentioned.  I'll jump there.
--
Michael

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Andres Freund
Дата:
Hi,

On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> While listening at Bertrand's talk about logical decoding on standbys
> last week at Prague, I got surprised by the fact that we do not
> reflect in the catalogs the reason why a conflict happened for a slot.
> There are three of them depending on ReplicationSlotInvalidationCause:
> - WAL removed.
> - Invalid horizon.
> - Insufficient WAL level.

It should be extremely rare to hit any of these other than "WAL removed", so
I'm not sure it's worth adding interface complexity to show them.


> ReplicationSlotCtl holds this information, so couldn't it be useful
> for monitoring purposes to know why a slot got invalidated and add a
> column to pg_get_replication_slots()?  This could just be an extra
> text conflicting_reason, defaulting to NULL when there's nothing to
> see.

Extra columns aren't free from a usability perspective. IFF we do something, I
think it should be a single column with a cause.

Greetings,

Andres Freund



Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> > While listening at Bertrand's talk about logical decoding on standbys
> > last week at Prague, I got surprised by the fact that we do not
> > reflect in the catalogs the reason why a conflict happened for a slot.
> > There are three of them depending on ReplicationSlotInvalidationCause:
> > - WAL removed.
> > - Invalid horizon.
> > - Insufficient WAL level.
>
> It should be extremely rare to hit any of these other than "WAL removed", so
> I'm not sure it's worth adding interface complexity to show them.
>
>
> > ReplicationSlotCtl holds this information, so couldn't it be useful
> > for monitoring purposes to know why a slot got invalidated and add a
> > column to pg_get_replication_slots()?  This could just be an extra
> > text conflicting_reason, defaulting to NULL when there's nothing to
> > see.
>
> Extra columns aren't free from a usability perspective. IFF we do something, I
> think it should be a single column with a cause.

Thanks for the feedback. But do you mean that we replace existing
'conflicting' column with 'cause' in both the function and view
(pg_get_replication_slots() and pg_replication_slots)?  Or do you mean
that we expose 'cause' from pg_get_replication_slots() and use that to
display 'conflicting' in pg_replication_slots view?

And if we plan to return/display cause from either function or view,
then shall it be enum 'ReplicationSlotInvalidationCause' or
description/text corresponding to enum?

 In the other feature being discussed "Synchronize slots from primary
to standby" [1] , there is a requirement to replicate invalidation
cause of slot from the primary to standby and thus it is needed in
enum form there. And thus there was a suggestion earlier to have the
function return enum-value and let the view display it as
text/description to the user.  So kindly let us know your thoughts.

[1] - https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com

thanks
Shveta



Re: Track in pg_replication_slots the reason why slots conflict?

От
Andres Freund
Дата:
Hi,

On 2023-12-21 16:08:48 +0530, shveta malik wrote:
> On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> > > While listening at Bertrand's talk about logical decoding on standbys
> > > last week at Prague, I got surprised by the fact that we do not
> > > reflect in the catalogs the reason why a conflict happened for a slot.
> > > There are three of them depending on ReplicationSlotInvalidationCause:
> > > - WAL removed.
> > > - Invalid horizon.
> > > - Insufficient WAL level.
> >
> > It should be extremely rare to hit any of these other than "WAL removed", so
> > I'm not sure it's worth adding interface complexity to show them.
> >
> >
> > > ReplicationSlotCtl holds this information, so couldn't it be useful
> > > for monitoring purposes to know why a slot got invalidated and add a
> > > column to pg_get_replication_slots()?  This could just be an extra
> > > text conflicting_reason, defaulting to NULL when there's nothing to
> > > see.
> >
> > Extra columns aren't free from a usability perspective. IFF we do something, I
> > think it should be a single column with a cause.
>
> Thanks for the feedback. But do you mean that we replace existing
> 'conflicting' column with 'cause' in both the function and view
> (pg_get_replication_slots() and pg_replication_slots)?  Or do you mean
> that we expose 'cause' from pg_get_replication_slots() and use that to
> display 'conflicting' in pg_replication_slots view?

I'm not entirely sure I understand the difference - just whether we add one
new column or replace the existing 'conflicting' column? I can see arguments
for either. A conflicting column where NULL indicates no conflict, and other
values indicate the reason for the conflict, doesn't seem too bad.


> And if we plan to return/display cause from either function or view,
> then shall it be enum 'ReplicationSlotInvalidationCause' or
> description/text corresponding to enum?

We clearly can't just expose the numerical value for a C enum. So it has to be
converted to something SQL representable.


>  In the other feature being discussed "Synchronize slots from primary
> to standby" [1] , there is a requirement to replicate invalidation
> cause of slot from the primary to standby and thus it is needed in
> enum form there. And thus there was a suggestion earlier to have the
> function return enum-value and let the view display it as
> text/description to the user.  So kindly let us know your thoughts.
>
> [1] - https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com

Can you point me to a more specific message for that requirement? It seems
pretty odd to me. Your link goes to the top of a 400 message thread, I don't
have time to find one specific design point in that...

Greetings,

Andres



Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Thu, Dec 21, 2023 at 5:04 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-12-21 16:08:48 +0530, shveta malik wrote:
> > On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > Hi,
> > >
> > > On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> > > > While listening at Bertrand's talk about logical decoding on standbys
> > > > last week at Prague, I got surprised by the fact that we do not
> > > > reflect in the catalogs the reason why a conflict happened for a slot.
> > > > There are three of them depending on ReplicationSlotInvalidationCause:
> > > > - WAL removed.
> > > > - Invalid horizon.
> > > > - Insufficient WAL level.
> > >
> > > It should be extremely rare to hit any of these other than "WAL removed", so
> > > I'm not sure it's worth adding interface complexity to show them.
> > >
> > >
> > > > ReplicationSlotCtl holds this information, so couldn't it be useful
> > > > for monitoring purposes to know why a slot got invalidated and add a
> > > > column to pg_get_replication_slots()?  This could just be an extra
> > > > text conflicting_reason, defaulting to NULL when there's nothing to
> > > > see.
> > >
> > > Extra columns aren't free from a usability perspective. IFF we do something, I
> > > think it should be a single column with a cause.
> >
> > Thanks for the feedback. But do you mean that we replace existing
> > 'conflicting' column with 'cause' in both the function and view
> > (pg_get_replication_slots() and pg_replication_slots)?  Or do you mean
> > that we expose 'cause' from pg_get_replication_slots() and use that to
> > display 'conflicting' in pg_replication_slots view?
>
> I'm not entirely sure I understand the difference - just whether we add one
> new column or replace the existing 'conflicting' column? I can see arguments
> for either. A conflicting column where NULL indicates no conflict, and other
> values indicate the reason for the conflict, doesn't seem too bad.
>
>
> > And if we plan to return/display cause from either function or view,
> > then shall it be enum 'ReplicationSlotInvalidationCause' or
> > description/text corresponding to enum?
>
> We clearly can't just expose the numerical value for a C enum. So it has to be
> converted to something SQL representable.
>
>
> >  In the other feature being discussed "Synchronize slots from primary
> > to standby" [1] , there is a requirement to replicate invalidation
> > cause of slot from the primary to standby and thus it is needed in
> > enum form there. And thus there was a suggestion earlier to have the
> > function return enum-value and let the view display it as
> > text/description to the user.  So kindly let us know your thoughts.
> >
> > [1] - https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
>
> Can you point me to a more specific message for that requirement? It seems
> pretty odd to me. Your link goes to the top of a 400 message thread, I don't
> have time to find one specific design point in that...

It is currently implemented there as a new function
'pg_get_slot_invalidation_cause()'  without changing existing view
pg_replication_slots. (See 2.1 in [1] where it was introduced).

Then it was suggested in [2] to fork a new thread as it makes sense to
have it independent of this slot-synchronization feature.

The new thread forked is [3]. In that thread, the issues in having a
new function pg_get_slot_invalidation_cause() are discussed and also
we came to know about this very thread that started the next day.

[1]: https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1K0KCDNtpDyUKucMRdyK-5KdrCRWakCpHEdHT9muAiEOw%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6N%3DanX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com

thanks
Shveta



Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Thu, Dec 21, 2023 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-12-21 16:08:48 +0530, shveta malik wrote:
> > On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > Extra columns aren't free from a usability perspective. IFF we do something, I
> > > think it should be a single column with a cause.
> >
> > Thanks for the feedback. But do you mean that we replace existing
> > 'conflicting' column with 'cause' in both the function and view
> > (pg_get_replication_slots() and pg_replication_slots)?  Or do you mean
> > that we expose 'cause' from pg_get_replication_slots() and use that to
> > display 'conflicting' in pg_replication_slots view?
>
> I'm not entirely sure I understand the difference - just whether we add one
> new column or replace the existing 'conflicting' column? I can see arguments
> for either.
>

Agreed. I think the argument against replacing the existing
'conflicting' column is that there is a chance that it is being used
by some monitoring script which I guess shouldn't be a big deal to
change. So, if we don't see that as a problem, I would prefer to have
a single column with conflict reason where one of its values indicates
there is no conflict.

A conflicting column where NULL indicates no conflict, and other
> values indicate the reason for the conflict, doesn't seem too bad.
>

This is fine too.

>
> > And if we plan to return/display cause from either function or view,
> > then shall it be enum 'ReplicationSlotInvalidationCause' or
> > description/text corresponding to enum?
>
> We clearly can't just expose the numerical value for a C enum. So it has to be
> converted to something SQL representable.
>

We can return int2 value from the function pg_get_replication_slots()
and then use that to display a string in the view
pg_replication_slots.

--
With Regards,
Amit Kapila.



Re: Track in pg_replication_slots the reason why slots conflict?

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Dec 21, 2023 at 07:55:51PM +0530, Amit Kapila wrote:
> On Thu, Dec 21, 2023 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:
> > I'm not entirely sure I understand the difference - just whether we add one
> > new column or replace the existing 'conflicting' column? I can see arguments
> > for either.
> >
> 
> Agreed. I think the argument against replacing the existing
> 'conflicting' column is that there is a chance that it is being used
> by some monitoring script which I guess shouldn't be a big deal to
> change. So, if we don't see that as a problem, I would prefer to have
> a single column with conflict reason where one of its values indicates
> there is no conflict.

+1

> A conflicting column where NULL indicates no conflict, and other
> > values indicate the reason for the conflict, doesn't seem too bad.
> >
> 
> This is fine too.

+1

> >
> > > And if we plan to return/display cause from either function or view,
> > > then shall it be enum 'ReplicationSlotInvalidationCause' or
> > > description/text corresponding to enum?
> >
> > We clearly can't just expose the numerical value for a C enum. So it has to be
> > converted to something SQL representable.
> >
> 
> We can return int2 value from the function pg_get_replication_slots()
> and then use that to display a string in the view
> pg_replication_slots.

Yeah, and in the sync slot related work we could use pg_get_replication_slots()
then to get the enum.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Track in pg_replication_slots the reason why slots conflict?

От
Andres Freund
Дата:
On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:
> On Thu, Dec 21, 2023 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:
> > We clearly can't just expose the numerical value for a C enum. So it has to be
> > converted to something SQL representable.
> >
> 
> We can return int2 value from the function pg_get_replication_slots()
> and then use that to display a string in the view
> pg_replication_slots.

I strongly dislike that pattern. It just leads to complicated views - and
doesn't provide a single benefit that I am aware of. It's much bettter to
simply populate the text version in pg_get_replication_slots().



Re: Track in pg_replication_slots the reason why slots conflict?

От
Michael Paquier
Дата:
On Thu, Dec 21, 2023 at 07:26:56AM -0800, Andres Freund wrote:
> On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:
>> We can return int2 value from the function pg_get_replication_slots()
>> and then use that to display a string in the view
>> pg_replication_slots.
>
> I strongly dislike that pattern. It just leads to complicated views - and
> doesn't provide a single benefit that I am aware of. It's much bettter to
> simply populate the text version in pg_get_replication_slots().

I agree that this is a better integration in the view, and that's what
I would do FWIW.

Amit, how much of a problem would it be to do a text->enum mapping
when synchronizing the slots from a primary to a standby?  Sure you
could have a system function that does some of the mapping work, but I
am not sure what's the best integration when it comes to the other
patch.
--
Michael

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Fri, Dec 22, 2023 at 5:00 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Dec 21, 2023 at 07:26:56AM -0800, Andres Freund wrote:
> > On 2023-12-21 19:55:51 +0530, Amit Kapila wrote:
> >> We can return int2 value from the function pg_get_replication_slots()
> >> and then use that to display a string in the view
> >> pg_replication_slots.
> >
> > I strongly dislike that pattern. It just leads to complicated views - and
> > doesn't provide a single benefit that I am aware of. It's much bettter to
> > simply populate the text version in pg_get_replication_slots().
>
> I agree that this is a better integration in the view, and that's what
> I would do FWIW.
>
> Amit, how much of a problem would it be to do a text->enum mapping
> when synchronizing the slots from a primary to a standby?
>

There is no problem as such in that. We were trying to see if there is
a more convenient way but let's move by having cause as text from both
the function and view as that seems to be a preferred way.

--
With Regards,
Amit Kapila.



Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Thu, Dec 21, 2023 at 8:21 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Dec 21, 2023 at 07:55:51PM +0530, Amit Kapila wrote:
> > On Thu, Dec 21, 2023 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:
> > > I'm not entirely sure I understand the difference - just whether we add one
> > > new column or replace the existing 'conflicting' column? I can see arguments
> > > for either.
> > >
> >
> > Agreed. I think the argument against replacing the existing
> > 'conflicting' column is that there is a chance that it is being used
> > by some monitoring script which I guess shouldn't be a big deal to
> > change. So, if we don't see that as a problem, I would prefer to have
> > a single column with conflict reason where one of its values indicates
> > there is no conflict.
>
> +1
>

Does anyone else have a preference on whether to change the existing
column or add a new one?

--
With Regards,
Amit Kapila.



Re: Track in pg_replication_slots the reason why slots conflict?

От
Michael Paquier
Дата:
On Tue, Dec 26, 2023 at 08:44:44AM +0530, Amit Kapila wrote:
> Does anyone else have a preference on whether to change the existing
> column or add a new one?

Just to be clear here, I'd vote for replacing the existing boolean
with a text.
--
Michael

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Dec 26, 2023 at 05:23:56PM +0900, Michael Paquier wrote:
> On Tue, Dec 26, 2023 at 08:44:44AM +0530, Amit Kapila wrote:
> > Does anyone else have a preference on whether to change the existing
> > column or add a new one?
> 
> Just to be clear here, I'd vote for replacing the existing boolean
> with a text.

Same here, I'd vote to avoid 2 columns having the same "meaning".

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Track in pg_replication_slots the reason why slots conflict?

От
Isaac Morland
Дата:
On Thu, 21 Dec 2023 at 09:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
 
A conflicting column where NULL indicates no conflict, and other
> values indicate the reason for the conflict, doesn't seem too bad.
>

This is fine too.

I prefer this option. There is precedent for doing it this way, for example in pg_stat_activity.wait_event_type.

The most common test of this field is likely to be "is there a conflict" and it's better to write this as "[fieldname] IS NOT NULL" than to introduce a magic constant. Also, it makes clear to future maintainers that this field has one purpose: saying what type of conflict there is, if any. If we find ourselves wanting to record a new non-conflict status (no idea what that could be: "almost conflict"? "probably conflict soon"?) there would be less temptation to break existing tests for "is there a conflict".

Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Tue, Dec 26, 2023 at 7:35 PM Isaac Morland <isaac.morland@gmail.com> wrote:
>
> On Thu, 21 Dec 2023 at 09:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>>
>> A conflicting column where NULL indicates no conflict, and other
>> > values indicate the reason for the conflict, doesn't seem too bad.
>> >
>>
>> This is fine too.
>
>
> I prefer this option. There is precedent for doing it this way, for example in pg_stat_activity.wait_event_type.
>
> The most common test of this field is likely to be "is there a conflict" and it's better to write this as
"[fieldname]IS NOT NULL" than to introduce a magic constant. Also, it makes clear to future maintainers that this field
hasone purpose: saying what type of conflict there is, if any. If we find ourselves wanting to record a new
non-conflictstatus (no idea what that could be: "almost conflict"? "probably conflict soon"?) there would be less
temptationto break existing tests for "is there a conflict". 

+1 on using "[fieldname] IS NOT NULL"  to  find  "is there a conflict"

PFA the patch which attempts to implement this.

This patch changes the existing 'conflicting' field to
'conflicting_cause' in pg_replication_slots. This new field is always
NULL for physical slots (like the previous field conflicting), as well
as for those logical slots which are not invalidated.

thanks
Shveta

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Wed, Dec 27, 2023 at 3:08 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> PFA the patch which attempts to implement this.
>
> This patch changes the existing 'conflicting' field to
> 'conflicting_cause' in pg_replication_slots.
>

This name sounds a bit odd to me, would it be better to name it as
conflict_cause?

A few other minor comments:
=========================
*
+       <structfield>conflicting_cause</structfield> <type>text</type>
+      </para>
+      <para>
+       Cause if this logical slot conflicted with recovery (and so is now
+       invalidated). It is always NULL for physical slots, as well as for
+       those logical slots which are not invalidated. Possible values are:

Would it better to use description as follows:" Cause of logical
slot's conflict with recovery. It is always NULL for physical slots,
as well as for logical slots which are not invalidated. The non-NULL
values indicate that the slot is marked as invalidated. Possible
values are:
.."

*
  $res = $node_standby->safe_psql(
  'postgres', qq(
- select bool_and(conflicting) from pg_replication_slots;));
+ select bool_and(conflicting) from
+ (select conflicting_cause is not NULL as conflicting from
pg_replication_slots);));

Won't the query "select conflicting_cause is not NULL as conflicting
from pg_replication_slots" can return false even for physical slots
and then impact the result of the main query whereas the original
query would seem to be simply ignoring physical slots? If this
observation is correct then you might want to add a 'slot_type'
condition in the new query.

* After introducing check_slots_conflicting_cause(), do we need to
have check_slots_conflicting_status()? Aren't both checking the same
thing?

--
With Regards,
Amit Kapila.



Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Wed, Dec 27, 2023 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 27, 2023 at 3:08 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > PFA the patch which attempts to implement this.
> >
> > This patch changes the existing 'conflicting' field to
> > 'conflicting_cause' in pg_replication_slots.
> >
>
> This name sounds a bit odd to me, would it be better to name it as
> conflict_cause?
>
> A few other minor comments:
> =========================
> *
> +       <structfield>conflicting_cause</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       Cause if this logical slot conflicted with recovery (and so is now
> +       invalidated). It is always NULL for physical slots, as well as for
> +       those logical slots which are not invalidated. Possible values are:
>
> Would it better to use description as follows:" Cause of logical
> slot's conflict with recovery. It is always NULL for physical slots,
> as well as for logical slots which are not invalidated. The non-NULL
> values indicate that the slot is marked as invalidated. Possible
> values are:
> .."
>
> *
>   $res = $node_standby->safe_psql(
>   'postgres', qq(
> - select bool_and(conflicting) from pg_replication_slots;));
> + select bool_and(conflicting) from
> + (select conflicting_cause is not NULL as conflicting from
> pg_replication_slots);));
>
> Won't the query "select conflicting_cause is not NULL as conflicting
> from pg_replication_slots" can return false even for physical slots
> and then impact the result of the main query whereas the original
> query would seem to be simply ignoring physical slots? If this
> observation is correct then you might want to add a 'slot_type'
> condition in the new query.
>
> * After introducing check_slots_conflicting_cause(), do we need to
> have check_slots_conflicting_status()? Aren't both checking the same
> thing?

I think it is needed for the case where we want to check that there is
no conflict.

# Verify slots are reported as non conflicting in pg_replication_slots
check_slots_conflicting_status(0);

For the cases where there is conflict, I think
check_slots_conflicting_cause() can replace
check_slots_conflicting_status().

thanks
Shveta



Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Thu, Dec 28, 2023 at 10:16 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Dec 27, 2023 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 27, 2023 at 3:08 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > PFA the patch which attempts to implement this.
> > >
> > > This patch changes the existing 'conflicting' field to
> > > 'conflicting_cause' in pg_replication_slots.
> > >
> >
> > This name sounds a bit odd to me, would it be better to name it as
> > conflict_cause?
> >
> > A few other minor comments:
> > =========================

Thanks for the feedback Amit.

> > *
> > +       <structfield>conflicting_cause</structfield> <type>text</type>
> > +      </para>
> > +      <para>
> > +       Cause if this logical slot conflicted with recovery (and so is now
> > +       invalidated). It is always NULL for physical slots, as well as for
> > +       those logical slots which are not invalidated. Possible values are:
> >
> > Would it better to use description as follows:" Cause of logical
> > slot's conflict with recovery. It is always NULL for physical slots,
> > as well as for logical slots which are not invalidated. The non-NULL
> > values indicate that the slot is marked as invalidated. Possible
> > values are:
> > .."
> >
> > *
> >   $res = $node_standby->safe_psql(
> >   'postgres', qq(
> > - select bool_and(conflicting) from pg_replication_slots;));
> > + select bool_and(conflicting) from
> > + (select conflicting_cause is not NULL as conflicting from
> > pg_replication_slots);));
> >
> > Won't the query "select conflicting_cause is not NULL as conflicting
> > from pg_replication_slots" can return false even for physical slots
> > and then impact the result of the main query whereas the original
> > query would seem to be simply ignoring physical slots? If this
> > observation is correct then you might want to add a 'slot_type'
> > condition in the new query.
> >
> > * After introducing check_slots_conflicting_cause(), do we need to
> > have check_slots_conflicting_status()? Aren't both checking the same
> > thing?
>
> I think it is needed for the case where we want to check that there is
> no conflict.
>
> # Verify slots are reported as non conflicting in pg_replication_slots
> check_slots_conflicting_status(0);
>
> For the cases where there is conflict, I think
> check_slots_conflicting_cause() can replace
> check_slots_conflicting_status().

I have removed check_slots_conflicting_status() and where it was
needed to check non-conflicting, I have added a simple query.

PFA the v2-patch with all your comments addressed.

thanks
Shveta

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Thu, Dec 28, 2023 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> PFA the v2-patch with all your comments addressed.
>

Does anyone have a preference for a column name? The options on the
table are conflict_cause, conflicting_cause, conflict_reason. Any
others? I was checking docs for similar usage and found
"pg_event_trigger_table_rewrite_reason" function, so based on that we
can even go with conflict_reason.

--
With Regards,
Amit Kapila.



Re: Track in pg_replication_slots the reason why slots conflict?

От
Michael Paquier
Дата:
On Fri, Dec 29, 2023 at 09:20:52AM +0530, Amit Kapila wrote:
> Does anyone have a preference for a column name? The options on the
> table are conflict_cause, conflicting_cause, conflict_reason. Any
> others? I was checking docs for similar usage and found
> "pg_event_trigger_table_rewrite_reason" function, so based on that we
> can even go with conflict_reason.

"conflict_reason" sounds like the natural choice here.
--
Michael

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Fri, Dec 29, 2023 at 3:35 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Dec 29, 2023 at 09:20:52AM +0530, Amit Kapila wrote:
> > Does anyone have a preference for a column name? The options on the
> > table are conflict_cause, conflicting_cause, conflict_reason. Any
> > others? I was checking docs for similar usage and found
> > "pg_event_trigger_table_rewrite_reason" function, so based on that we
> > can even go with conflict_reason.
>
> "conflict_reason" sounds like the natural choice here.

Do we have more comments on the patch apart from column_name?

thanks
Shveta



Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Mon, Jan 1, 2024 at 9:14 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Dec 29, 2023 at 3:35 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Fri, Dec 29, 2023 at 09:20:52AM +0530, Amit Kapila wrote:
> > > Does anyone have a preference for a column name? The options on the
> > > table are conflict_cause, conflicting_cause, conflict_reason. Any
> > > others? I was checking docs for similar usage and found
> > > "pg_event_trigger_table_rewrite_reason" function, so based on that we
> > > can even go with conflict_reason.
> >
> > "conflict_reason" sounds like the natural choice here.
>
> Do we have more comments on the patch apart from column_name?
>
> thanks
> Shveta

PFA v3 after changing column name to 'conflict_reason'

thanks
Shveta

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Mon, Jan 1, 2024 at 12:32 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> PFA v3 after changing column name to 'conflict_reason'
>

Few minor comments:
===================
1.
+          <para>
+           <literal>wal_removed</literal> = required WAL has been removed.
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           <literal>rows_removed</literal> = required rows have been removed.
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           <literal>wal_level_insufficient</literal> = wal_level
insufficient on the primary server.
+          </para>

Should we use the same style to write the description as we are using
for the wal_status column? For example, <literal>wal_removed</literal>
means that the required WAL has been removed.

2.
+      <para>
+       The reason of logical slot's conflict with recovery.

My grammar tool says it should be: "The reason for the logical slot's
conflict with recovery."

Other than these minor comments, the patch looks good to me.

--
With Regards,
Amit Kapila.



Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Mon, Jan 1, 2024 at 4:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jan 1, 2024 at 12:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > PFA v3 after changing column name to 'conflict_reason'
> >
>
> Few minor comments:
> ===================
> 1.
> +          <para>
> +           <literal>wal_removed</literal> = required WAL has been removed.
> +          </para>
> +         </listitem>
> +         <listitem>
> +          <para>
> +           <literal>rows_removed</literal> = required rows have been removed.
> +          </para>
> +         </listitem>
> +         <listitem>
> +          <para>
> +           <literal>wal_level_insufficient</literal> = wal_level
> insufficient on the primary server.
> +          </para>
>
> Should we use the same style to write the description as we are using
> for the wal_status column? For example, <literal>wal_removed</literal>
> means that the required WAL has been removed.
>
> 2.
> +      <para>
> +       The reason of logical slot's conflict with recovery.
>
> My grammar tool says it should be: "The reason for the logical slot's
> conflict with recovery."
>
> Other than these minor comments, the patch looks good to me.

PFA  v4 which addresses the doc comments.

thanks
Shveta

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
shveta malik
Дата:
On Mon, Jan 1, 2024 at 5:17 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Jan 1, 2024 at 4:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jan 1, 2024 at 12:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > PFA v3 after changing column name to 'conflict_reason'
> > >
> >
> > Few minor comments:
> > ===================
> > 1.
> > +          <para>
> > +           <literal>wal_removed</literal> = required WAL has been removed.
> > +          </para>
> > +         </listitem>
> > +         <listitem>
> > +          <para>
> > +           <literal>rows_removed</literal> = required rows have been removed.
> > +          </para>
> > +         </listitem>
> > +         <listitem>
> > +          <para>
> > +           <literal>wal_level_insufficient</literal> = wal_level
> > insufficient on the primary server.
> > +          </para>
> >
> > Should we use the same style to write the description as we are using
> > for the wal_status column? For example, <literal>wal_removed</literal>
> > means that the required WAL has been removed.
> >
> > 2.
> > +      <para>
> > +       The reason of logical slot's conflict with recovery.
> >
> > My grammar tool says it should be: "The reason for the logical slot's
> > conflict with recovery."
> >
> > Other than these minor comments, the patch looks good to me.
>
> PFA  v4 which addresses the doc comments.

Please ignore the previous patch and PFA new v4 (v4_2). The only
change from the earlier v4 is the subject correction in commit msg.

thanks
Shveta

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Mon, Jan 1, 2024 at 5:24 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Please ignore the previous patch and PFA new v4 (v4_2). The only
> change from the earlier v4 is the subject correction in commit msg.
>

The patch looks good to me. I have slightly changed one of the
descriptions in the docs and also modified the commit message a bit. I
will push this after two days unless there are any more
comments/suggestions.

--
With Regards,
Amit Kapila.

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Jan 02, 2024 at 10:35:59AM +0530, Amit Kapila wrote:
> On Mon, Jan 1, 2024 at 5:24 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Please ignore the previous patch and PFA new v4 (v4_2). The only
> > change from the earlier v4 is the subject correction in commit msg.
> >

Thanks for the patch!

> The patch looks good to me. I have slightly changed one of the
> descriptions in the docs and also modified the commit message a bit. I
> will push this after two days unless there are any more
> comments/suggestions.
>

The patch LGTM, I just have a Nit comment:

+           <literal>wal_level_insufficient</literal> means that the
+           <xref linkend="guc-wal-level"/> is insufficient on the primary
+           server.

I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I think it's
better to directly mention it is linked to the primary (without the need to refer
to the documentation) and that the fact that it is "insufficient" is more or less
implicit.

Basically I think that with "primary_wal_level" one would need to refer to the doc
less frequently than with "wal_level_insufficient".

But again, that's a Nit so feel free to ignore.

Regards,
 
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Track in pg_replication_slots the reason why slots conflict?

От
Michael Paquier
Дата:
On Tue, Jan 02, 2024 at 02:07:58PM +0000, Bertrand Drouvot wrote:
> +           <literal>wal_level_insufficient</literal> means that the
> +           <xref linkend="guc-wal-level"/> is insufficient on the primary
> +           server.
>
> I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I think it's
> better to directly mention it is linked to the primary (without the need to refer
> to the documentation) and that the fact that it is "insufficient" is more or less
> implicit.
>
> Basically I think that with "primary_wal_level" one would need to refer to the doc
> less frequently than with "wal_level_insufficient".

I can see your point, but wal_level_insufficient speaks a bit more to
me because of its relationship with the GUC setting.   Something like
wal_level_insufficient_on_primary may speak better, but that's also
quite long.  I'm OK with what the patch does.

+       as invalidated. Possible values are:
+        <itemizedlist spacing="compact">
Higher-level nit: indentation seems to be one space off here.
--
Michael

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Amit Kapila
Дата:
On Wed, Jan 3, 2024 at 7:10 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 02, 2024 at 02:07:58PM +0000, Bertrand Drouvot wrote:
> > +           <literal>wal_level_insufficient</literal> means that the
> > +           <xref linkend="guc-wal-level"/> is insufficient on the primary
> > +           server.
> >
> > I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I think it's
> > better to directly mention it is linked to the primary (without the need to refer
> > to the documentation) and that the fact that it is "insufficient" is more or less
> > implicit.
> >
> > Basically I think that with "primary_wal_level" one would need to refer to the doc
> > less frequently than with "wal_level_insufficient".
>
> I can see your point, but wal_level_insufficient speaks a bit more to
> me because of its relationship with the GUC setting.   Something like
> wal_level_insufficient_on_primary may speak better, but that's also
> quite long.  I'm OK with what the patch does.
>

Thanks, I also prefer "wal_level_insufficient". To me
"primary_wal_level" sounds more along the lines of a GUC name than the
conflict_reason. The other names that come to mind are
"wal_level_lower_than_required", "wal_level_lower",
"wal_level_lesser_than_required", "wal_level_lesser" but I feel
"wal_level_insufficient" sounds better than these. Having said that, I
am open to any of these or better options for this conflict_reason.

> +       as invalidated. Possible values are:
> +        <itemizedlist spacing="compact">
> Higher-level nit: indentation seems to be one space off here.
>

Thanks, fixed in the attached patch.

--
With Regards,
Amit Kapila.

Вложения

Re: Track in pg_replication_slots the reason why slots conflict?

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Jan 03, 2024 at 08:53:44AM +0530, Amit Kapila wrote:
> On Wed, Jan 3, 2024 at 7:10 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Tue, Jan 02, 2024 at 02:07:58PM +0000, Bertrand Drouvot wrote:
> > > +           <literal>wal_level_insufficient</literal> means that the
> > > +           <xref linkend="guc-wal-level"/> is insufficient on the primary
> > > +           server.
> > >
> > > I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I think it's
> > > better to directly mention it is linked to the primary (without the need to refer
> > > to the documentation) and that the fact that it is "insufficient" is more or less
> > > implicit.
> > >
> > > Basically I think that with "primary_wal_level" one would need to refer to the doc
> > > less frequently than with "wal_level_insufficient".
> >
> > I can see your point, but wal_level_insufficient speaks a bit more to
> > me because of its relationship with the GUC setting.   Something like
> > wal_level_insufficient_on_primary may speak better, but that's also
> > quite long.  I'm OK with what the patch does.
> >
> 
> Thanks, I also prefer "wal_level_insufficient". To me
> "primary_wal_level" sounds more along the lines of a GUC name than the
> conflict_reason. The other names that come to mind are
> "wal_level_lower_than_required", "wal_level_lower",
> "wal_level_lesser_than_required", "wal_level_lesser" but I feel
> "wal_level_insufficient" sounds better than these. Having said that, I
> am open to any of these or better options for this conflict_reason.
> 

Thank you both for giving your thoughts on it, I got your points and I'm OK with
"wal_level_insufficient".

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Track in pg_replication_slots the reason why slots conflict?

От
vignesh C
Дата:
On Wed, 3 Jan 2024 at 08:54, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 7:10 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Tue, Jan 02, 2024 at 02:07:58PM +0000, Bertrand Drouvot wrote:
> > > +           <literal>wal_level_insufficient</literal> means that the
> > > +           <xref linkend="guc-wal-level"/> is insufficient on the primary
> > > +           server.
> > >
> > > I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I think it's
> > > better to directly mention it is linked to the primary (without the need to refer
> > > to the documentation) and that the fact that it is "insufficient" is more or less
> > > implicit.
> > >
> > > Basically I think that with "primary_wal_level" one would need to refer to the doc
> > > less frequently than with "wal_level_insufficient".
> >
> > I can see your point, but wal_level_insufficient speaks a bit more to
> > me because of its relationship with the GUC setting.   Something like
> > wal_level_insufficient_on_primary may speak better, but that's also
> > quite long.  I'm OK with what the patch does.
> >
>
> Thanks, I also prefer "wal_level_insufficient". To me
> "primary_wal_level" sounds more along the lines of a GUC name than the
> conflict_reason. The other names that come to mind are
> "wal_level_lower_than_required", "wal_level_lower",
> "wal_level_lesser_than_required", "wal_level_lesser" but I feel
> "wal_level_insufficient" sounds better than these. Having said that, I
> am open to any of these or better options for this conflict_reason.
>
> > +       as invalidated. Possible values are:
> > +        <itemizedlist spacing="compact">
> > Higher-level nit: indentation seems to be one space off here.
> >
>
> Thanks, fixed in the attached patch.

I have marked the commitfest entry to the committed state as the patch
is committed.

Regards,
Vignesh