Обсуждение: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

Поиск
Список
Период
Сортировка

pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

От
Nikhil Benesch
Дата:
While working on Materialize's streaming logical replication from Postgres [0],
my colleagues Sean Loiselle and Petros Angelatos (CC'd) discovered today what
appears to be a correctness bug in pgoutput, introduced in v15.

The problem goes like this. A table with REPLICA IDENTITY FULL and some
data in it...

    CREATE TABLE t (a int);
    ALTER TABLE t REPLICA IDENTITY FULL;
    INSERT INTO t VALUES (1), (2), (3), ...;

...undergoes a schema change to add a new column with a default:

    ALTER TABLE t ADD COLUMN b bool DEFAULT false NOT NULL;

PostgreSQL is smart and does not rewrite the entire table during the schema
change. Instead it updates the tuple description to indicate to future readers
of the table that if `b` is missing, it should be filled in with the default
value, `false`.

Unfortunately, since v15, pgoutput mishandles missing attributes. If a
downstream server is subscribed to changes from t via the pgoutput plugin, when
a row with a missing attribute is updated, e.g.:

    UPDATE t SET a = 2 WHERE a = 1

pgoutput will incorrectly report b's value as NULL in the old tuple, rather than
false. Using the same example:

    old: a=1, b=NULL
    new: a=2, b=true

The subscriber will ignore the update (as it has no row with values
a=1, b=NULL), and thus the subscriber's copy of `t` will become out of sync with
the publisher's.

I bisected the problem to 52e4f0cd4 [1], which introduced row filtering for
publications. The problem appears to be the use of CreateTupleDescCopy where
CreateTupleDescCopyConstr is required, as the former drops the constraints
in the tuple description (specifically, the default value constraint) on the
floor.

I've attached a patch which both fixes the issue and includes a test. I've
verified that the test fails against the current master and passes against
the patched version.

I'm relatively unfamiliar with the project norms here, but assuming the patch is
acceptable, this strikes me as important enough to warrant a backport to both
v15 and v16.

[0]: https://materialize.com/docs/sql/create-source/postgres
[1]: https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78

Вложения

Re: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

От
Amit Kapila
Дата:
On Thu, Nov 23, 2023 at 1:10 PM Nikhil Benesch <nikhil.benesch@gmail.com> wrote:
>
> While working on Materialize's streaming logical replication from Postgres [0],
> my colleagues Sean Loiselle and Petros Angelatos (CC'd) discovered today what
> appears to be a correctness bug in pgoutput, introduced in v15.
>
> The problem goes like this. A table with REPLICA IDENTITY FULL and some
> data in it...
>
>     CREATE TABLE t (a int);
>     ALTER TABLE t REPLICA IDENTITY FULL;
>     INSERT INTO t VALUES (1), (2), (3), ...;
>
> ...undergoes a schema change to add a new column with a default:
>
>     ALTER TABLE t ADD COLUMN b bool DEFAULT false NOT NULL;
>
> PostgreSQL is smart and does not rewrite the entire table during the schema
> change. Instead it updates the tuple description to indicate to future readers
> of the table that if `b` is missing, it should be filled in with the default
> value, `false`.
>
> Unfortunately, since v15, pgoutput mishandles missing attributes. If a
> downstream server is subscribed to changes from t via the pgoutput plugin, when
> a row with a missing attribute is updated, e.g.:
>
>     UPDATE t SET a = 2 WHERE a = 1
>
> pgoutput will incorrectly report b's value as NULL in the old tuple, rather than
> false.
>

Thanks, I could reproduce this behavior. I'll look into your patch.

> Using the same example:
>
>     old: a=1, b=NULL
>     new: a=2, b=true
>
> The subscriber will ignore the update (as it has no row with values
> a=1, b=NULL), and thus the subscriber's copy of `t` will become out of sync with
> the publisher's.
>
> I bisected the problem to 52e4f0cd4 [1], which introduced row filtering for
> publications. The problem appears to be the use of CreateTupleDescCopy where
> CreateTupleDescCopyConstr is required, as the former drops the constraints
> in the tuple description (specifically, the default value constraint) on the
> floor.
>
> I've attached a patch which both fixes the issue and includes a test. I've
> verified that the test fails against the current master and passes against
> the patched version.
>
> I'm relatively unfamiliar with the project norms here, but assuming the patch is
> acceptable, this strikes me as important enough to warrant a backport to both
> v15 and v16.
>

Right.

--
With Regards,
Amit Kapila.



Re: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

От
Amit Kapila
Дата:
On Thu, Nov 23, 2023 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Nov 23, 2023 at 1:10 PM Nikhil Benesch <nikhil.benesch@gmail.com> wrote:
> >
> > While working on Materialize's streaming logical replication from Postgres [0],
> > my colleagues Sean Loiselle and Petros Angelatos (CC'd) discovered today what
> > appears to be a correctness bug in pgoutput, introduced in v15.
> >
> > The problem goes like this. A table with REPLICA IDENTITY FULL and some
> > data in it...
> >
> >     CREATE TABLE t (a int);
> >     ALTER TABLE t REPLICA IDENTITY FULL;
> >     INSERT INTO t VALUES (1), (2), (3), ...;
> >
> > ...undergoes a schema change to add a new column with a default:
> >
> >     ALTER TABLE t ADD COLUMN b bool DEFAULT false NOT NULL;
> >
> > PostgreSQL is smart and does not rewrite the entire table during the schema
> > change. Instead it updates the tuple description to indicate to future readers
> > of the table that if `b` is missing, it should be filled in with the default
> > value, `false`.
> >
> > Unfortunately, since v15, pgoutput mishandles missing attributes. If a
> > downstream server is subscribed to changes from t via the pgoutput plugin, when
> > a row with a missing attribute is updated, e.g.:
> >
> >     UPDATE t SET a = 2 WHERE a = 1
> >
> > pgoutput will incorrectly report b's value as NULL in the old tuple, rather than
> > false.
> >
>
> Thanks, I could reproduce this behavior. I'll look into your patch.
>

I verified your fix is good and made minor modifications in the
comment. Note, that the test doesn't work for PG15, needs minor
modifications.

--
With Regards,
Amit Kapila.

Вложения

RE: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

От
"Zhijie Hou (Fujitsu)"
Дата:
On Friday, November 24, 2023 7:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Nov 23, 2023 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Thu, Nov 23, 2023 at 1:10 PM Nikhil Benesch <nikhil.benesch@gmail.com>
> wrote:
> > >
> > > While working on Materialize's streaming logical replication from
> > > Postgres [0], my colleagues Sean Loiselle and Petros Angelatos
> > > (CC'd) discovered today what appears to be a correctness bug in pgoutput,
> introduced in v15.
> > >
> > > The problem goes like this. A table with REPLICA IDENTITY FULL and
> > > some data in it...
> > >
> > >     CREATE TABLE t (a int);
> > >     ALTER TABLE t REPLICA IDENTITY FULL;
> > >     INSERT INTO t VALUES (1), (2), (3), ...;
> > >
> > > ...undergoes a schema change to add a new column with a default:
> > >
> > >     ALTER TABLE t ADD COLUMN b bool DEFAULT false NOT NULL;
> > >
> > > PostgreSQL is smart and does not rewrite the entire table during the
> > > schema change. Instead it updates the tuple description to indicate
> > > to future readers of the table that if `b` is missing, it should be
> > > filled in with the default value, `false`.
> > >
> > > Unfortunately, since v15, pgoutput mishandles missing attributes. If
> > > a downstream server is subscribed to changes from t via the pgoutput
> > > plugin, when a row with a missing attribute is updated, e.g.:
> > >
> > >     UPDATE t SET a = 2 WHERE a = 1
> > >
> > > pgoutput willz incorrectly report b's value as NULL in the old tuple,
> > > rather than false.
> > >
> >
> > Thanks, I could reproduce this behavior. I'll look into your patch.
> >
> 
> I verified your fix is good and made minor modifications in the comment. Note,
> that the test doesn't work for PG15, needs minor modifications.

Thank you for fixing and reviewing the fix!

The fix also looks good to me. I verified that it can fix the problem in
HEAD ~ PG15 and the added tap test can detect the problem without the fix. I
tried to rebase the patch on PG15, and combines some queries into one safe_sql
block to simplify the code. Here are the patches for all branches.

Best Regards,
Hou zj


Вложения

Re: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

От
Nikhil Benesch
Дата:
Thank you both for reviewing. The updated patch set LGTM.


Nikhil

On Fri, Nov 24, 2023 at 7:21 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, November 24, 2023 7:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Nov 23, 2023 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > >
> > > On Thu, Nov 23, 2023 at 1:10 PM Nikhil Benesch <nikhil.benesch@gmail.com>
> > wrote:
> > > >
> > > > While working on Materialize's streaming logical replication from
> > > > Postgres [0], my colleagues Sean Loiselle and Petros Angelatos
> > > > (CC'd) discovered today what appears to be a correctness bug in pgoutput,
> > introduced in v15.
> > > >
> > > > The problem goes like this. A table with REPLICA IDENTITY FULL and
> > > > some data in it...
> > > >
> > > >     CREATE TABLE t (a int);
> > > >     ALTER TABLE t REPLICA IDENTITY FULL;
> > > >     INSERT INTO t VALUES (1), (2), (3), ...;
> > > >
> > > > ...undergoes a schema change to add a new column with a default:
> > > >
> > > >     ALTER TABLE t ADD COLUMN b bool DEFAULT false NOT NULL;
> > > >
> > > > PostgreSQL is smart and does not rewrite the entire table during the
> > > > schema change. Instead it updates the tuple description to indicate
> > > > to future readers of the table that if `b` is missing, it should be
> > > > filled in with the default value, `false`.
> > > >
> > > > Unfortunately, since v15, pgoutput mishandles missing attributes. If
> > > > a downstream server is subscribed to changes from t via the pgoutput
> > > > plugin, when a row with a missing attribute is updated, e.g.:
> > > >
> > > >     UPDATE t SET a = 2 WHERE a = 1
> > > >
> > > > pgoutput willz incorrectly report b's value as NULL in the old tuple,
> > > > rather than false.
> > > >
> > >
> > > Thanks, I could reproduce this behavior. I'll look into your patch.
> > >
> >
> > I verified your fix is good and made minor modifications in the comment. Note,
> > that the test doesn't work for PG15, needs minor modifications.
>
> Thank you for fixing and reviewing the fix!
>
> The fix also looks good to me. I verified that it can fix the problem in
> HEAD ~ PG15 and the added tap test can detect the problem without the fix. I
> tried to rebase the patch on PG15, and combines some queries into one safe_sql
> block to simplify the code. Here are the patches for all branches.
>
> Best Regards,
> Hou zj
>



Re: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

От
Amit Kapila
Дата:
On Fri, Nov 24, 2023 at 7:23 PM Nikhil Benesch <nikhil.benesch@gmail.com> wrote:
>
> Thank you both for reviewing. The updated patch set LGTM.
>

Pushed!

--
With Regards,
Amit Kapila.



Re: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

От
Nikhil Benesch
Дата:
Thank you for turning this around so quickly!