Обсуждение: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

Поиск
Список
Период
Сортировка

Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Pavel Hanák
Дата:
Hi,

I've got very the similar problem as described in the message
with the subject "got some errors after upgrade poestgresql from 9.5 to 9.6".

I'll try to describe this very strange behaviour.

I have a table A which has some "pg.dropped" attribute in
pg_attribute table. It looks like:
 select attname, attnum   from pg_attribute  where attrelid = A::regclass and attisdropped;
            attname            | attnum  -------------------------------+-------- ........pg.dropped.57........ |
57(1 row) 

Now, I create SQL function doing only update on this table
when the boolean parameter of the function is True:
 CREATE OR REPLACE FUNCTION _test_sql_update(in do_update boolean) RETURNS VOID LANGUAGE sql VOLATILE AS $$     update
A       set col = NULL      where do_update; $$; 

Now running:
 select _test_sql_update(False);

returns this error:
 ERROR:  table row type and query-specified row type do not match DETAIL:  Query provides a value for a dropped column
atordinal position 57. CONTEXT:  SQL function "_test_sql_update" statement 1 

If I don't use the parameter in "where" and instead I use the constant
False directly, everything works:
 CREATE OR REPLACE FUNCTION _test_sql_update(in do_update boolean) RETURNS VOID LANGUAGE sql VOLATILE AS $$     update
A       set col = NULL      where False; $$; 
 select _test_sql_update(False);
 SQL=#  _test_sql  ----------- (1 row)

If I define the function as plpgsql, everything is also working:
 CREATE OR REPLACE FUNCTION _test_plpgsql_update(in do_update boolean) RETURNS VOID LANGUAGE plpgsql VOLATILE AS $$
BEGIN   update A      set col = NULL    where do_update; END; $$; 


My conclusion is:

The problem occurs only under these circumstances:

- Postgresql 9.6 (no problem in 9.5)

- SQL function doing update

- There is a boolean parameter of the fucntion used in the update command and the table which is updated has some
attisdroppedattributes 

Can anybody explain what is the problem?

Thanks
Pavel



Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Pavel Stehule
Дата:
2016-11-01 22:56 GMT+01:00 Pavel Han=C3=A1k <hanak@is-it.eu>:

> Hi,
>
> I've got very the similar problem as described in the message
> with the subject "got some errors after upgrade poestgresql from 9.5 to
> 9.6".
>
> I'll try to describe this very strange behaviour.
>
> I have a table A which has some "pg.dropped" attribute in
> pg_attribute table. It looks like:
>
>   select attname, attnum
>     from pg_attribute
>    where attrelid =3D A::regclass and attisdropped;
>
>              attname            | attnum
>   -------------------------------+--------
>   ........pg.dropped.57........ |     57
>   (1 row)
>
> Now, I create SQL function doing only update on this table
> when the boolean parameter of the function is True:
>
>   CREATE OR REPLACE FUNCTION _test_sql_update(in do_update boolean)
>   RETURNS VOID LANGUAGE sql VOLATILE AS $$
>       update A
>          set col =3D NULL
>        where do_update;
>   $$;
>
> Now running:
>
>   select _test_sql_update(False);
>
> returns this error:
>
>   ERROR:  table row type and query-specified row type do not match
>   DETAIL:  Query provides a value for a dropped column at ordinal positio=
n
> 57.
>   CONTEXT:  SQL function "_test_sql_update" statement 1
>

This is runtime error - this check is evaluated, when function returns one
or more rows


>
> If I don't use the parameter in "where" and instead I use the constant
> False directly, everything works:
>
>   CREATE OR REPLACE FUNCTION _test_sql_update(in do_update boolean)
>   RETURNS VOID LANGUAGE sql VOLATILE AS $$
>       update A
>          set col =3D NULL
>        where False;
>   $$;
>
>   select _test_sql_update(False);
>
>   SQL=3D#  _test_sql
>   -----------
>
>   (1 row)
>

in this case, the check is not evaluated because there is not any row on
result


>
> If I define the function as plpgsql, everything is also working:
>
>   CREATE OR REPLACE FUNCTION _test_plpgsql_update(in do_update boolean)
>   RETURNS VOID LANGUAGE plpgsql VOLATILE AS $$
>   BEGIN
>      update A
>        set col =3D NULL
>      where do_update;
>   END;
>   $$;
>
>
> My conclusion is:
>
> The problem occurs only under these circumstances:
>
> - Postgresql 9.6 (no problem in 9.5)
>
> - SQL function doing update
>
> - There is a boolean parameter of the fucntion used in the update command
>   and the table which is updated has some attisdropped attributes
>
> Can anybody explain what is the problem?
>

The pipeline of SQL and PLpgSQL functions is pretty different - this is new
regression in 9.6 code.

Regards

Pavel



>
> Thanks
> Pavel
>

Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Pavel Stehule
Дата:
2016-11-01 22:56 GMT+01:00 Pavel Han=C3=A1k <hanak@is-it.eu>:

> Hi,
>
> I've got very the similar problem as described in the message
> with the subject "got some errors after upgrade poestgresql from 9.5 to
> 9.6".
>
> I'll try to describe this very strange behaviour.
>
> I have a table A which has some "pg.dropped" attribute in
> pg_attribute table. It looks like:
>
>   select attname, attnum
>     from pg_attribute
>    where attrelid =3D A::regclass and attisdropped;
>
>              attname            | attnum
>   -------------------------------+--------
>   ........pg.dropped.57........ |     57
>   (1 row)
>
> Now, I create SQL function doing only update on this table
> when the boolean parameter of the function is True:
>
>   CREATE OR REPLACE FUNCTION _test_sql_update(in do_update boolean)
>   RETURNS VOID LANGUAGE sql VOLATILE AS $$
>       update A
>          set col =3D NULL
>        where do_update;
>   $$;
>
> Now running:
>
>   select _test_sql_update(False);
>
> returns this error:
>
>   ERROR:  table row type and query-specified row type do not match
>   DETAIL:  Query provides a value for a dropped column at ordinal positio=
n
> 57.
>   CONTEXT:  SQL function "_test_sql_update" statement 1
>
> If I don't use the parameter in "where" and instead I use the constant
> False directly, everything works:
>
>   CREATE OR REPLACE FUNCTION _test_sql_update(in do_update boolean)
>   RETURNS VOID LANGUAGE sql VOLATILE AS $$
>       update A
>          set col =3D NULL
>        where False;
>   $$;
>
>   select _test_sql_update(False);
>
>   SQL=3D#  _test_sql
>   -----------
>
>   (1 row)
>
> If I define the function as plpgsql, everything is also working:
>
>   CREATE OR REPLACE FUNCTION _test_plpgsql_update(in do_update boolean)
>   RETURNS VOID LANGUAGE plpgsql VOLATILE AS $$
>   BEGIN
>      update A
>        set col =3D NULL
>      where do_update;
>   END;
>   $$;
>
>
> My conclusion is:
>
> The problem occurs only under these circumstances:
>
> - Postgresql 9.6 (no problem in 9.5)
>
> - SQL function doing update
>
> - There is a boolean parameter of the fucntion used in the update command
>   and the table which is updated has some attisdropped attributes
>
> Can anybody explain what is the problem?
>

please, can you send test case - I cannot to reproduce this bug on master.

Regards

Pavel



>
> Thanks
> Pavel
>

Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Michael Paquier
Дата:
On Wed, Nov 2, 2016 at 3:32 PM, Pavel Stehule <pavel.stehule@gmail.com> wro=
te:
> 2016-11-01 22:56 GMT+01:00 Pavel Han=C3=A1k <hanak@is-it.eu>:
>> Can anybody explain what is the problem?
>
> please, can you send test case - I cannot to reproduce this bug on master=
.

I can reproduce the regression on master and 9.6:
DROP TABLE aa;
CREATE TABLE aa (a int, b int, c int);
CREATE OR REPLACE FUNCTION _test_sql_update(in do_update boolean)
    RETURNS VOID LANGUAGE sql VOLATILE AS $$
          update aa
           set a =3D NULL
          where do_update;
    $$;
INSERT INTO aa VALUES (generate_series(1,1,1));
INSERT INTO aa VALUES (generate_series(2,1,1));
INSERT INTO aa VALUES (generate_series(3,1,1));
ALTER TABLE aa DROP COLUMN b;
SELECT _test_sql_update(false);
SELECT _test_sql_update(true);

I am just digging into it... An interesting part is that removing the
WHERE clause makes the regression go away.
--=20
Michael

Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Michael Paquier
Дата:
On Wed, Nov 2, 2016 at 3:54 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I am just digging into it... An interesting part is that removing the
> WHERE clause makes the regression go away.

And one bisect later:
commit: 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f
author: Tom Lane <tgl@sss.pgh.pa.us>
date: Mon, 7 Mar 2016 15:58:22 -0500
Make the upper part of the planner work by generating and comparing Paths

In short, that's going to be fun to create a patch...
--
Michael

Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> And one bisect later:
> commit: 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f
> author: Tom Lane <tgl@sss.pgh.pa.us>
> date: Mon, 7 Mar 2016 15:58:22 -0500
> Make the upper part of the planner work by generating and comparing Paths

Yeah.  The reimplementation I did in createplan.c causes the planner to
generate a targetlist that seems valid in practice, but doesn't pass the
rather simplistic test in ExecCheckPlanOutput.  (The latter is expecting
a simple null Const corresponding to a dropped column, but what it's
getting is a Var that references a null Const one plan level down.)

Not sure whether it's reasonable to make the planner jump through extra
hoops here, or complicate the null-for-dropped-cols test in
ExecCheckPlanOutput, or just delete that as not really needed.

            regards, tom lane

Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> I can reproduce the regression on master and 9.6:
> DROP TABLE aa;
> CREATE TABLE aa (a int, b int, c int);
> CREATE OR REPLACE FUNCTION _test_sql_update(in do_update boolean)
>     RETURNS VOID LANGUAGE sql VOLATILE AS $$
>           update aa
>            set a = NULL
>           where do_update;
>     $$;
> ALTER TABLE aa DROP COLUMN b;

OK, I looked into this enough to identify exactly where it broke.
The targetlist as set up by preptlist.c looks like

    null Const    -- value being assigned to a
    null Const    -- placeholder for dropped b column
    Var for c    -- to copy updated row's previous c value forward
    Var for ctid    -- since ModifyTable will need the row TID (resjunk)

The plan tree needs to contain a SeqScan on aa, returning at least values
for c and ctid, plus a Result node that will gate execution based on the
"do_update" parameter, and then a ModifyTable on top.  For safety's sake,
the ModifyTable wants to check that the output of its child node matches
the table's current actual rowtype; in particular it wants to see that
the value being put into b is null, since it knows that column is dropped.

In previous releases, the plan tree returned by create_plan() was just
the SeqScan and Result, and the tlists that those output contained just
"c" and "ctid", ie only the required Vars.  Then grouping_planner pasted
the final tlist onto just the top plan node (the Result).

As of 9.6, create_plan is aware that it ought to return the final tlist,
and it sets up the SeqScan to do that --- then, more or less as an
afterthought (see create_gating_plan), it pastes a Result on top
producing the same tlist.  So at this point we have Const,Const,Var,Var
in both plan nodes, and you'd think that should still be fine.  But it
isn't, because setrefs.c processes the Result's tlist to replace Vars
in it with references to the outputs of the child SeqScan.  And it will
also notice that the Consts match expressions in the child, and replace
them with Vars.  So now the Result's tlist contains four Vars referencing
the child output columns, and that makes ExecCheckPlanOutput unhappy.

The fact that this didn't happen before seems to hinge entirely on the
fact that in prior versions create_plan always returned tlists containing
only Vars, so that there was no opportunity for setrefs.c to match a Const
to lower-level output unless there was more than one level of Plan node
stuck on top of that result within planner.c --- and in an INSERT or
UPDATE, there would be nothing stuck on top except the ModifyTable.

So I'm pretty tempted to fix it as per attached, that is, teach setrefs.c
that it's silly to try to convert a Const into a Var.  It is sane to
replace more-complicated expressions with Vars, because that means we save
re-evaluating those expressions.  But a quick look at ExecEvalConst and
ExecEvalVar shows that a Const is strictly cheaper to execute than a Var,
so doing such a conversion is actually a pessimization.

This doesn't seem to make for any changes in the core regression tests,
but postgres_fdw notices, eg here:

***************
*** 2450,2456 ****
     Output: (count(c2)), c2, (5), (7.0), (9)
     Sort Key: ft1.c2
     ->  Foreign Scan
!          Output: (count(c2)), c2, (5), (7.0), (9)
           Relations: Aggregate on (public.ft1)
           Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY
 c2, 5::integer, 9::integer
  (7 rows)
--- 2450,2456 ----
     Output: (count(c2)), c2, (5), (7.0), (9)
     Sort Key: ft1.c2
     ->  Foreign Scan
!          Output: (count(c2)), c2, 5, 7.0, 9
           Relations: Aggregate on (public.ft1)
           Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY
 c2, 5::integer, 9::integer
  (7 rows)

Losing the indirection doesn't seem like a problem, but I wonder if
anyone's concerned about it.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d10a983..cd10794 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** search_indexed_tlist_for_non_var(Node *n
*** 2010,2015 ****
--- 2010,2024 ----
  {
      TargetEntry *tle;

+     /*
+      * If it's a simple Const, replacing it with a Var is silly, even if there
+      * happens to be an identical Const below.  What's more, doing so could
+      * confuse some places in the executor that expect to see simple Consts
+      * for, eg, dropped columns.
+      */
+     if (IsA(node, Const))
+         return NULL;
+
      tle = tlist_member(node, itlist->tlist);
      if (tle)
      {

Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Tom Lane
Дата:
I wrote:
> So I'm pretty tempted to fix it as per attached, that is, teach setrefs.c
> that it's silly to try to convert a Const into a Var.  It is sane to
> replace more-complicated expressions with Vars, because that means we save
> re-evaluating those expressions.  But a quick look at ExecEvalConst and
> ExecEvalVar shows that a Const is strictly cheaper to execute than a Var,
> so doing such a conversion is actually a pessimization.

After further poking around, I concluded that it'd be a good idea to make
a similar change in set_dummy_tlist_references(), to prevent a failure
if the top plan node below ModifyTable chances to be a Sort or other
non-projecting plan node.  I'm not sure such a case can arise today, but
I'm not sure it can't either.

With that, there are a couple more changes in regression test EXPLAIN
output.  A full patch that passes "make check-world" is attached.

I'm not sure if it's worth memorializing the specific test case discussed
in this thread as a regression test.  It depends enough on the behavior
of SQL function planning that I wouldn't have much confidence in it
continuing to test what we meant it to test going forward.

            regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2745ad5..7339b58 100644
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
*************** select count(c2) w, c2 x, 5 y, 7.0 z fro
*** 2447,2456 ****
                                                  QUERY PLAN
  ----------------------------------------------------------------------------------------------------------
   Sort
!    Output: (count(c2)), c2, (5), (7.0), (9)
     Sort Key: ft1.c2
     ->  Foreign Scan
!          Output: (count(c2)), c2, (5), (7.0), (9)
           Relations: Aggregate on (public.ft1)
           Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY c2, 5::integer, 9::integer
  (7 rows)
--- 2447,2456 ----
                                                  QUERY PLAN
  ----------------------------------------------------------------------------------------------------------
   Sort
!    Output: (count(c2)), c2, 5, 7.0, 9
     Sort Key: ft1.c2
     ->  Foreign Scan
!          Output: (count(c2)), c2, 5, 7.0, 9
           Relations: Aggregate on (public.ft1)
           Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY c2, 5::integer, 9::integer
  (7 rows)
*************** select count(*) from (select c5, count(c
*** 2499,2505 ****
   Aggregate
     Output: count(*)
     ->  Foreign Scan
!          Output: ft1.c5, (NULL::bigint), (sqrt((ft1.c2)::double precision))
           Filter: (((((avg(ft1.c1)) / (avg(ft1.c1))))::double precision * random()) <= '1'::double precision)
           Relations: Aggregate on (public.ft1)
           Remote SQL: SELECT c5, NULL::bigint, sqrt(c2), avg("C 1") FROM "S 1"."T 1" GROUP BY c5, (sqrt(c2)) HAVING
((avg("C1") < 500::numeric)) 
--- 2499,2505 ----
   Aggregate
     Output: count(*)
     ->  Foreign Scan
!          Output: ft1.c5, NULL::bigint, (sqrt((ft1.c2)::double precision))
           Filter: (((((avg(ft1.c1)) / (avg(ft1.c1))))::double precision * random()) <= '1'::double precision)
           Relations: Aggregate on (public.ft1)
           Remote SQL: SELECT c5, NULL::bigint, sqrt(c2), avg("C 1") FROM "S 1"."T 1" GROUP BY c5, (sqrt(c2)) HAVING
((avg("C1") < 500::numeric)) 
*************** select sum(q.a), count(q.b) from ft4 lef
*** 3139,3145 ****
                 ->  Subquery Scan on q
                       Output: q.a, q.b
                       ->  Foreign Scan
!                            Output: (13), (avg(ft1.c1)), (NULL::bigint)
                             Relations: Aggregate on ((public.ft2) LEFT JOIN (public.ft1))
                             Remote SQL: SELECT 13, avg(r1."C 1"), NULL::bigint FROM ("S 1"."T 1" r2 LEFT JOIN "S 1"."T
1"r1 ON (((r1."C 1" = r2."C 1")))) 
  (16 rows)
--- 3139,3145 ----
                 ->  Subquery Scan on q
                       Output: q.a, q.b
                       ->  Foreign Scan
!                            Output: 13, (avg(ft1.c1)), NULL::bigint
                             Relations: Aggregate on ((public.ft2) LEFT JOIN (public.ft1))
                             Remote SQL: SELECT 13, avg(r1."C 1"), NULL::bigint FROM ("S 1"."T 1" r2 LEFT JOIN "S 1"."T
1"r1 ON (((r1."C 1" = r2."C 1")))) 
  (16 rows)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d10a983..d91bc3b 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** set_dummy_tlist_references(Plan *plan, i
*** 1823,1828 ****
--- 1823,1841 ----
          Var           *oldvar = (Var *) tle->expr;
          Var           *newvar;

+         /*
+          * As in search_indexed_tlist_for_non_var(), we prefer to keep Consts
+          * as Consts, not Vars referencing Consts.  Here, there's no speed
+          * advantage to be had, but it makes EXPLAIN output look cleaner, and
+          * again it avoids confusing the executor.
+          */
+         if (IsA(oldvar, Const))
+         {
+             /* just reuse the existing TLE node */
+             output_targetlist = lappend(output_targetlist, tle);
+             continue;
+         }
+
          newvar = makeVar(OUTER_VAR,
                           tle->resno,
                           exprType((Node *) oldvar),
*************** search_indexed_tlist_for_non_var(Node *n
*** 2010,2015 ****
--- 2023,2038 ----
  {
      TargetEntry *tle;

+     /*
+      * If it's a simple Const, replacing it with a Var is silly, even if there
+      * happens to be an identical Const below; a Var is more expensive to
+      * execute than a Const.  What's more, replacing it could confuse some
+      * places in the executor that expect to see simple Consts for, eg,
+      * dropped columns.
+      */
+     if (IsA(node, Const))
+         return NULL;
+
      tle = tlist_member(node, itlist->tlist);
      if (tle)
      {
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index df7cba6..79e9969 100644
*** a/src/test/regress/expected/inherit.out
--- b/src/test/regress/expected/inherit.out
*************** ORDER BY thousand, tenthous;
*** 1426,1432 ****
     Sort Key: tenk1.thousand, tenk1.tenthous
     ->  Index Only Scan using tenk1_thous_tenthous on tenk1
     ->  Sort
!          Sort Key: (42), (42)
           ->  Index Only Scan using tenk1_hundred on tenk1 tenk1_1
  (6 rows)

--- 1426,1432 ----
     Sort Key: tenk1.thousand, tenk1.tenthous
     ->  Index Only Scan using tenk1_thous_tenthous on tenk1
     ->  Sort
!          Sort Key: 42, 42
           ->  Index Only Scan using tenk1_hundred on tenk1 tenk1_1
  (6 rows)


Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6

От
Michael Paquier
Дата:
On Thu, Nov 3, 2016 at 2:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> After further poking around, I concluded that it'd be a good idea to make
> a similar change in set_dummy_tlist_references(), to prevent a failure
> if the top plan node below ModifyTable chances to be a Sort or other
> non-projecting plan node.  I'm not sure such a case can arise today, but
> I'm not sure it can't either.

[ ... Back on this thread ... ]
Thanks for the input! It would have taken waaay longer to me to figure
out a clean patch. I am not very familiar with this code :)

> I'm not sure if it's worth memorializing the specific test case discussed
> in this thread as a regression test.  It depends enough on the behavior
> of SQL function planning that I wouldn't have much confidence in it
> continuing to test what we meant it to test going forward.

Definitely fine to not include it for me, the regression tests
modified by your patch and the git history are enough to understand
the story of the fix.
--
Michael