Обсуждение: Logical replication can be broken by domain constraint with NOT VALIDoption

Поиск
Список
Период
Сортировка

Logical replication can be broken by domain constraint with NOT VALIDoption

От
Andrey Lepikhov
Дата:
Hi,

During patch development I ran into a small problem (see attachment, 
fail_replication.sh):
1. We have a table with logical replication to another node.
2. On the master and replica add such "NOT VALID" domain constraint on 
the table  that some tuples  violates the constraint.
3. UPDATE the table: set value of the tuple that violates constraint to 
correct value.
4. That's all!

The reason for this problem is that on UPDATE walsender sends old tuple 
value (that violates the constraint) with new version (satisfied the 
constraint).
Replication worker at replica node restores slot from transfer 
representation. During this process domain checking constraint and 
returns an ERROR.
Because we can't apply WAL record of the UPDATE command, logical 
replication will be stopped at all.
As I understand, this problem can be reproduced in all postgres versions 
with logical replication feature.
This problem can be solved by many ways and approaches. I wrote the 
patch to solve this problem (see in attachment) by the shortest way.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Вложения

Re: Logical replication can be broken by domain constraint with NOT VALID option

От
Tom Lane
Дата:
Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
> During patch development I ran into a small problem (see attachment, 
> fail_replication.sh):
> 1. We have a table with logical replication to another node.
> 2. On the master and replica add such "NOT VALID" domain constraint on 
> the table  that some tuples  violates the constraint.
> 3. UPDATE the table: set value of the tuple that violates constraint to 
> correct value.
> 4. That's all!

> The reason for this problem is that on UPDATE walsender sends old tuple 
> value (that violates the constraint) with new version (satisfied the 
> constraint).
> Replication worker at replica node restores slot from transfer 
> representation. During this process domain checking constraint and 
> returns an ERROR.

I'm not sure this is something we should attempt to fix.  There are
an infinite number of ways you can break logical replication by
presenting it with inconsistent data, and that's really what you've
done here.

> This problem can be solved by many ways and approaches. I wrote the 
> patch to solve this problem (see in attachment) by the shortest way.

That patch is certainly utterly unacceptable.  It'd allow the
receipient to accept data that violates the domain constraint.

The situation you're describing would probably best be handled by
not adding the constraint on the replica side until all the
bad data has been corrected (and replicated).

            regards, tom lane



Re: Logical replication can be broken by domain constraint with NOTVALID option

От
Andrey Lepikhov
Дата:

On 03/11/2019 20:42, Tom Lane wrote:
> Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
>> The reason for this problem is that on UPDATE walsender sends old tuple
>> value (that violates the constraint) with new version (satisfied the
>> constraint).
>> Replication worker at replica node restores slot from transfer
>> representation. During this process domain checking constraint and
>> returns an ERROR.
> 
> I'm not sure this is something we should attempt to fix.  There are
> an infinite number of ways you can break logical replication by
> presenting it with inconsistent data, and that's really what you've
> done here.

This problem reproduced by standard way from the documentation. I assume 
this inconsistency option is allowed by SQL standard because it has a 
practical usage.

> 
>> This problem can be solved by many ways and approaches. I wrote the
>> patch to solve this problem (see in attachment) by the shortest way.
> 
> That patch is certainly utterly unacceptable.  It'd allow the
> receipient to accept data that violates the domain constraint.

If this is the only reason, I propose a new version of the patch (see in 
attachment). It is satisfy the "Paranoid safety" rule.
> 
> The situation you're describing would probably best be handled by
> not adding the constraint on the replica side until all the
> bad data has been corrected (and replicated).

On any PostgreSQL-based multimaster system, this will be a problem.

--
regards,
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Вложения

Re: Logical replication can be broken by domain constraint with NOTVALID option

От
Euler Taveira
Дата:
Em dom., 3 de nov. de 2019 às 23:33, Andrey Lepikhov
<a.lepikhov@postgrespro.ru> escreveu:
>
> On 03/11/2019 20:42, Tom Lane wrote:
> > Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
> >> The reason for this problem is that on UPDATE walsender sends old tuple
> >> value (that violates the constraint) with new version (satisfied the
> >> constraint).
> >> Replication worker at replica node restores slot from transfer
> >> representation. During this process domain checking constraint and
> >> returns an ERROR.
> >
> > I'm not sure this is something we should attempt to fix.  There are
> > an infinite number of ways you can break logical replication by
> > presenting it with inconsistent data, and that's really what you've
> > done here.
>
> This problem reproduced by standard way from the documentation. I assume
> this inconsistency option is allowed by SQL standard because it has a
> practical usage.
>
Could you point out the problem in the documentation?

> >
> >> This problem can be solved by many ways and approaches. I wrote the
> >> patch to solve this problem (see in attachment) by the shortest way.
> >
> > That patch is certainly utterly unacceptable.  It'd allow the
> > receipient to accept data that violates the domain constraint.
>
> If this is the only reason, I propose a new version of the patch (see in
> attachment). It is satisfy the "Paranoid safety" rule.
> >
I don't think that is acceptable either. If you have different schemas
(even for a small period of time), you should handle it dropping and
recreating the constraints. Logical replication is far from a complete
feature. There should be cases that someone wants to enforce even the
FK constraints in the subscriber. I certainly wouldn't like to open
that can of worms. Relaxing constraints could lead to inconsistent
datasets across nodes. If you want to accept constraint violation,
drop the constraints.

> > The situation you're describing would probably best be handled by
> > not adding the constraint on the replica side until all the
> > bad data has been corrected (and replicated).
>
> On any PostgreSQL-based multimaster system, this will be a problem.
>
... if you do not replicate DDLs in the same order it occurs or if you
have different schemas.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: Logical replication can be broken by domain constraint with NOTVALID option

От
Andrey Lepikhov
Дата:

On 05/11/2019 20:21, Euler Taveira wrote:
> Em dom., 3 de nov. de 2019 às 23:33, Andrey Lepikhov
> <a.lepikhov@postgrespro.ru> escreveu:
>> If this is the only reason, I propose a new version of the patch (see in
>> attachment). It is satisfy the "Paranoid safety" rule.
> I don't think that is acceptable either. If you have different schemas
> (even for a small period of time), you should handle it dropping and
> recreating the constraints.

Changing schema is a big deal. But adding a constraint with "not valid" 
option can be used frequently. May be for change phone numbers format, 
for example.

> Logical replication is far from a complete
> feature. There should be cases that someone wants to enforce even the
> FK constraints in the subscriber. I certainly wouldn't like to open
> that can of worms. Relaxing constraints could lead to inconsistent
> datasets across nodes. If you want to accept constraint violation,
> drop the constraints.

May be logical replication is incomplete. But it is no argument to not 
fix an errors that we found.
In v2 version of the patch constraints are suppressed only for old 
version of the tuple that used for search in the heap and can't be 
applied. In this sense we do not relaxing any constraints.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company