Обсуждение: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

Поиск
Список
Период
Сортировка

doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear hackers,

This is a fork thread from [1]. While analyzing codes I noticed that UPDATE and
DELETE cannot be replicated when REPLICA IDENTITY is FULL and the table has datatype
which does not have the operator class of Btree. I thnk this restriction is not
documented but should be. PSA the patch to add that. Thought?

[1]:
https://www.postgresql.org/message-id/TYAPR01MB586687A51AB511E5A7F7D3E6F526A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Peter Smith
Дата:
On Mon, Jul 10, 2023 at 1:33 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> This is a fork thread from [1]. While analyzing codes I noticed that UPDATE and
> DELETE cannot be replicated when REPLICA IDENTITY is FULL and the table has datatype
> which does not have the operator class of Btree. I thnk this restriction is not
> documented but should be. PSA the patch to add that. Thought?
>
> [1]:
https://www.postgresql.org/message-id/TYAPR01MB586687A51AB511E5A7F7D3E6F526A%40TYAPR01MB5866.jpnprd01.prod.outlook.com
>

Hi.

+1 for the patch.

Here are some minor review comments:

======

1.
SUGGESTION (minor reword)
If the published table specifies <literal>REPLICA IDENTITY
FULL</literal> but the table includes an attribute whose datatype is
not an operator class of Btree, then <literal>UPDATE</literal> and
<literal>DELETE</literal> operations cannot be replicated. To make it
work, a primary key should be defined on the subscriber table, or a
different appropriate replica identity must be specified.

2.
Maybe "REPLICA IDENTITY FULL" should have a link, like from this [1] page.

------
[1] 31.1 Publication =
https://www.postgresql.org/docs/current/logical-replication-publication.html

Kind Regards,
Peter Smith.
Fujitsu Australia



RE: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Peter,

Thanks for checking! PSA new version.

> 1.
> SUGGESTION (minor reword)
> If the published table specifies <literal>REPLICA IDENTITY
> FULL</literal> but the table includes an attribute whose datatype is
> not an operator class of Btree, then <literal>UPDATE</literal> and
> <literal>DELETE</literal> operations cannot be replicated. To make it
> work, a primary key should be defined on the subscriber table, or a
> different appropriate replica identity must be specified.

Seems better, fixed.

> 2.
> Maybe "REPLICA IDENTITY FULL" should have a link, like from this [1] page.

Added.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Amit Kapila
Дата:
On Mon, Jul 10, 2023 at 2:33 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>

     If the published table specifies
+     <link linkend="sql-altertable-replica-identity-full"><literal>REPLICA
IDENTITY FULL</literal></link>
+     but the table includes an attribute whose datatype is not an operator
+     class of Btree,

Isn't the same true for the hash operator class family as well? Can we
slightly change the line as: "... the table includes an attribute
whose datatype doesn't have an equality operator defined for it..".
Also, I find the proposed wording a bit odd, can we swap the sentence
to say something like: "The UPDATE and DELETE operations cannot be
replicated for the published tables that specifies REPLICA IDENTITY
FULL but the table includes an attribute whose datatype doesn't have
an equality operator defined for it on the subscriber."?

--
With Regards,
Amit Kapila.



Re:doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Sergei Kornilov
Дата:
Hello

Is this restriction only for the subscriber?

If we have not changed the replica identity and there is no primary key, then we forbid update and delete on the
publicationside (a fairly common usage error at the beginning of using publications).
 
If we have replica identity FULL (the table has such a column), then on the subscription side, update and delete will
beperformed. But we will not be able to apply them on a subscription. Right?
 

This is an important difference for real use, when the subscriber is not necessarily postgresql - for example,
debezium.

regards, Sergei



Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Amit Kapila
Дата:
On Mon, Jul 10, 2023 at 4:33 PM Sergei Kornilov <sk@zsrv.org> wrote:
>
> Is this restriction only for the subscriber?
>
> If we have not changed the replica identity and there is no primary key, then we forbid update and delete on the
publicationside (a fairly common usage error at the beginning of using publications). 
> If we have replica identity FULL (the table has such a column), then on the subscription side, update and delete will
beperformed. 
>

In the above sentence, do you mean the publisher side?

>
 But we will not be able to apply them on a subscription. Right?
>

If your previous sentence talks about the publisher and this sentence
about the subscriber then what you are saying is correct. You can see
the example in the email [1].

> This is an important difference for real use, when the subscriber is not necessarily postgresql - for example,
debezium.
>

Can you explain the difference and problem you are seeing? As per my
understanding, this is the behavior from the time logical replication
has been introduced.

[1] -
https://www.postgresql.org/message-id/TYAPR01MB5866C7B6086EB74918910F74F527A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.



Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Sergei Kornilov
Дата:
>> Is this restriction only for the subscriber?
>>
>> If we have not changed the replica identity and there is no primary key, then we forbid update and delete on the
publicationside (a fairly common usage error at the beginning of using publications).
 
>> If we have replica identity FULL (the table has such a column), then on the subscription side, update and delete
willbe performed.
 
> 
> In the above sentence, do you mean the publisher side?

Yep, sorry.

> But we will not be able to apply them on a subscription. Right?
> 
> If your previous sentence talks about the publisher and this sentence
> about the subscriber then what you are saying is correct. You can see
> the example in the email [1].

Thank you

>> This is an important difference for real use, when the subscriber is not necessarily postgresql - for example,
debezium.
> 
> Can you explain the difference and problem you are seeing? As per my
> understanding, this is the behavior from the time logical replication
> has been introduced.

The difference is that if it's a subscriber-only restriction, then it won't automatically apply to anyone with a
non-postgresqlsubscriber.
 
But if suddenly this would be a limitation of the publisher - then it will automatically apply to everyone, regardless
ofwhich subscriber is used.
 
(and it's a completely different problem if the restriction affects the update/delete themselves, not only their
replication.Like as default replica identity on table without primary key, not in this case)
 

So, I suggest to mention subscriber explicitly:

+     class of Btree, then <literal>UPDATE</literal> and <literal>DELETE</literal>
-      operations cannot be replicated.
+     operations cannot be applied on subscriber.

Another example of difference:
Debezium users sometimes ask to set identity to FULL to get access to old values:
https://stackoverflow.com/a/59820210/10983392
However, identity FULL is described in the documentation as:
https://www.postgresql.org/docs/current/logical-replication-publication.html

> If the table does not have any suitable key, then it can be set to replica identity “full”, which means the entire
rowbecomes the key. This, however, is very inefficient and should only be used as a fallback if no other solution is
possible.

But not mentioned, this would only be "very inefficient" for the subscriber, or would have an huge impact on the
publishertoo (besides writing more WAL).
 

regards, Sergei



Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Amit Kapila
Дата:
On Mon, Jul 10, 2023 at 7:26 PM Sergei Kornilov <sk@zsrv.org> wrote:
>
> >> Is this restriction only for the subscriber?
> >>
> >> If we have not changed the replica identity and there is no primary key, then we forbid update and delete on the
publicationside (a fairly common usage error at the beginning of using publications). 
> >> If we have replica identity FULL (the table has such a column), then on the subscription side, update and delete
willbe performed. 
> >
> > In the above sentence, do you mean the publisher side?
>
> Yep, sorry.
>
> > But we will not be able to apply them on a subscription. Right?
> >
> > If your previous sentence talks about the publisher and this sentence
> > about the subscriber then what you are saying is correct. You can see
> > the example in the email [1].
>
> Thank you
>
> >> This is an important difference for real use, when the subscriber is not necessarily postgresql - for example,
debezium.
> >
> > Can you explain the difference and problem you are seeing? As per my
> > understanding, this is the behavior from the time logical replication
> > has been introduced.
>
> The difference is that if it's a subscriber-only restriction, then it won't automatically apply to anyone with a
non-postgresqlsubscriber. 
> But if suddenly this would be a limitation of the publisher - then it will automatically apply to everyone,
regardlessof which subscriber is used. 
> (and it's a completely different problem if the restriction affects the update/delete themselves, not only their
replication.Like as default replica identity on table without primary key, not in this case) 
>
> So, I suggest to mention subscriber explicitly:
>
> +     class of Btree, then <literal>UPDATE</literal> and <literal>DELETE</literal>
> -      operations cannot be replicated.
> +     operations cannot be applied on subscriber.
>
> Another example of difference:
> Debezium users sometimes ask to set identity to FULL to get access to old values:
https://stackoverflow.com/a/59820210/10983392
> However, identity FULL is described in the documentation as:
https://www.postgresql.org/docs/current/logical-replication-publication.html
>

After seeing this, I am thinking about whether we add this restriction
on the Subscription page [1] or Restrictions page [2] as proposed. Do
you others have any preference?

[1] - https://www.postgresql.org/docs/devel/logical-replication-subscription.html
[2] - https://www.postgresql.org/docs/devel/logical-replication-restrictions.html

--
With Regards,
Amit Kapila.



RE: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit,

> Isn't the same true for the hash operator class family as well?

True. I didn't write it on purpose because I didn't know the operator which is 
operator class for BTree but not for Hash. But I agreed to clarify it.

> Can we
> slightly change the line as: "... the table includes an attribute
> whose datatype doesn't have an equality operator defined for it..".

Hmm, this suggestion is dubious for me. Regarding the point datatype, it has the
"same as" operator [1]. E.g., following SQL returns true.

```
postgres=# select point '(1, 1)' ~= point '(1, 1)';
 ?column? 
----------
 t
(1 row)
```

The reason why they cannot be supported by tuples_equal() is that lookup_type_cache()
only checks the operator classes for Btree and Hash. ~= does not defined as the class.

> Also, I find the proposed wording a bit odd, can we swap the sentence
> to say something like: "The UPDATE and DELETE operations cannot be
> replicated for the published tables that specifies REPLICA IDENTITY
> FULL but the table includes an attribute whose datatype doesn't have
> an equality operator defined for it on the subscriber."?

Swapped. But based on above reply, I did not completely use your suggestion.

[1]: https://www.postgresql.org/docs/devel/functions-geometry.html

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

RE: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Sergei,

Thank you for giving comment!

The restriction is only for subscriber: the publisher can publish the changes
to downstream under the condition, but the subscriber cannot apply that.

> So, I suggest to mention subscriber explicitly:
> 
> +     class of Btree, then <literal>UPDATE</literal> and
> <literal>DELETE</literal>
> -      operations cannot be replicated.
> +     operations cannot be applied on subscriber.

I accepted the comment. Please see [1].

[1]:
https://www.postgresql.org/message-id/TYAPR01MB58664DB6ECA653A6922B3FE3F531A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


RE: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit,

> After seeing this, I am thinking about whether we add this restriction
> on the Subscription page [1] or Restrictions page [2] as proposed. Do
> you others have any preference?
> 
> [1] -
> https://www.postgresql.org/docs/devel/logical-replication-subscription.html
> [2] -
> https://www.postgresql.org/docs/devel/logical-replication-restrictions.html

Thanks for giving suggestion. But I still think it should be at "Restrictions" page
because all the limitation has been listed that page.
Moreover, the condition of this limitation is not closed to subscriber - the setup
on publisher is also related. I think such descriptions it may cause readers
to be confused.


But anyway, I have never been in mind such a point of view.
Maybe I should hear Sergei's opinion. Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Amit Kapila
Дата:
On Tue, Jul 11, 2023 at 12:30 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit,
>
> > Isn't the same true for the hash operator class family as well?
>
> True. I didn't write it on purpose because I didn't know the operator which is
> operator class for BTree but not for Hash. But I agreed to clarify it.
>
> > Can we
> > slightly change the line as: "... the table includes an attribute
> > whose datatype doesn't have an equality operator defined for it..".
>
> Hmm, this suggestion is dubious for me. Regarding the point datatype, it has the
> "same as" operator [1]. E.g., following SQL returns true.
>
> ```
> postgres=# select point '(1, 1)' ~= point '(1, 1)';
>  ?column?
> ----------
>  t
> (1 row)
> ```
>
> The reason why they cannot be supported by tuples_equal() is that lookup_type_cache()
> only checks the operator classes for Btree and Hash. ~= does not defined as the class.
>

Fair enough, but the part of the line:"..  whose datatype is not an
operator class of Btree or Hash." doesn't appear very clear to me.
Because it sounds like we are checking whether datatype has any
operator class for btree or hash access methods but we are actually
checking if there is an equality operator (function) defined in the
default op class for those access methods. Am, I missing something?

--
With Regards,
Amit Kapila.



Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Sergei Kornilov
Дата:
Hello

I think it's appropriate to add on the restrictions page. (But mentioning that this restriction is only for
subscriber)

If the list were larger, then the restrictions page could be divided into publisher and subscriber restrictions. But
notfor one very specific restriction.
 

regards, Sergei



Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Amit Kapila
Дата:
On Tue, Jul 11, 2023 at 2:17 PM Sergei Kornilov <sk@zsrv.org> wrote:
>
> I think it's appropriate to add on the restrictions page. (But mentioning that this restriction is only for
subscriber)
>
> If the list were larger, then the restrictions page could be divided into publisher and subscriber restrictions. But
notfor one very specific restriction. 
>

Okay, how about something like: "The UPDATE and DELETE operations
cannot be applied on the subscriber for the published tables that
specify REPLICA IDENTITY FULL when the table has attributes with
datatypes (e.g point or box) that don't have a default operator class
for Btree or Hash. This won't be a problem if the table has a primary
key or replica identity defined for it."?

--
With Regards,
Amit Kapila.



RE: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, Sergei,

> > I think it's appropriate to add on the restrictions page. (But mentioning that this
> restriction is only for subscriber)
> >
> > If the list were larger, then the restrictions page could be divided into publisher
> and subscriber restrictions. But not for one very specific restriction.
> >
> 
> Okay, how about something like: "The UPDATE and DELETE operations
> cannot be applied on the subscriber for the published tables that
> specify REPLICA IDENTITY FULL when the table has attributes with
> datatypes (e.g point or box) that don't have a default operator class
> for Btree or Hash. This won't be a problem if the table has a primary
> key or replica identity defined for it."?

Thanks for discussing and giving suggestions. But it seems that the first
sentence is difficult to read for me. How about attached?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Amit Kapila
Дата:
On Fri, Jul 14, 2023 at 2:15 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > > I think it's appropriate to add on the restrictions page. (But mentioning that this
> > restriction is only for subscriber)
> > >
> > > If the list were larger, then the restrictions page could be divided into publisher
> > and subscriber restrictions. But not for one very specific restriction.
> > >
> >
> > Okay, how about something like: "The UPDATE and DELETE operations
> > cannot be applied on the subscriber for the published tables that
> > specify REPLICA IDENTITY FULL when the table has attributes with
> > datatypes (e.g point or box) that don't have a default operator class
> > for Btree or Hash. This won't be a problem if the table has a primary
> > key or replica identity defined for it."?
>
> Thanks for discussing and giving suggestions. But it seems that the first
> sentence is difficult to read for me. How about attached?
>

The last line seems repetitive to me. So, I have removed it. Apart
from that patch looks good to me. Sergie, Peter, and others, any
thoughts?

--
With Regards,
Amit Kapila.

Вложения

Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Peter Smith
Дата:
On Sat, Jul 15, 2023 at 2:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 14, 2023 at 2:15 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > > I think it's appropriate to add on the restrictions page. (But mentioning that this
> > > restriction is only for subscriber)
> > > >
> > > > If the list were larger, then the restrictions page could be divided into publisher
> > > and subscriber restrictions. But not for one very specific restriction.
> > > >
> > >
> > > Okay, how about something like: "The UPDATE and DELETE operations
> > > cannot be applied on the subscriber for the published tables that
> > > specify REPLICA IDENTITY FULL when the table has attributes with
> > > datatypes (e.g point or box) that don't have a default operator class
> > > for Btree or Hash. This won't be a problem if the table has a primary
> > > key or replica identity defined for it."?
> >
> > Thanks for discussing and giving suggestions. But it seems that the first
> > sentence is difficult to read for me. How about attached?
> >
>
> The last line seems repetitive to me. So, I have removed it. Apart
> from that patch looks good to me. Sergie, Peter, and others, any
> thoughts?

The v5 patch LGTM.

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Önder Kalacı
Дата:
Hi,

>
> The last line seems repetitive to me. So, I have removed it. Apart
> from that patch looks good to me. Sergie, Peter, and others, any
> thoughts?

The v5 patch LGTM.

 
Overall looks good to me as well. Please consider the following as an optional improvement.

My only minor concern here is the use of the term "default operator class". It is accurate to use it. However, as far as I know, not many users can follow that easily. I think the "pkey/repl full" suggestion gives some tip, but I wonder if we add something like the following to the text such that users can understand more:

 do not have a default operator class for B-tree or Hash.
+ If  there is no default operator class, usually the type does not have an equality operator.
However,  this limitation ..

Thanks,
Onder

Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Amit Kapila
Дата:
On Mon, Jul 17, 2023 at 11:51 AM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>> >
>> > The last line seems repetitive to me. So, I have removed it. Apart
>> > from that patch looks good to me. Sergie, Peter, and others, any
>> > thoughts?
>>
>> The v5 patch LGTM.
>>
>
> Overall looks good to me as well. Please consider the following as an optional improvement.
>
> My only minor concern here is the use of the term "default operator class". It is accurate to use it. However, as far
asI know, not many users can follow that easily. I think the "pkey/repl full" suggestion gives some tip, but I wonder
ifwe add something like the following to the text such that users can understand more: 
>
>>  do not have a default operator class for B-tree or Hash.
>>
>> + If  there is no default operator class, usually the type does not have an equality operator.
>>

This sounds a bit generic to me. If required, we can give an example
so that it is easier to understand. But OTOH, I see that we use
"default operator class" in the docs and error messages, so this
should be probably okay.


--
With Regards,
Amit Kapila.



Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

От
Amit Kapila
Дата:
On Mon, Jul 17, 2023 at 11:51 AM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>> >
>> > The last line seems repetitive to me. So, I have removed it. Apart
>> > from that patch looks good to me. Sergie, Peter, and others, any
>> > thoughts?
>>
>> The v5 patch LGTM.
>>
>
> Overall looks good to me as well. Please consider the following as an optional improvement.
>

Pushed. Thanks for looking into this.

--
With Regards,
Amit Kapila.