Обсуждение: PG 14.5 -- Impossible to restore dump due to interaction/order of views, functions, and generated columns

Поиск
Список
Период
Сортировка

PG 14.5 -- Impossible to restore dump due to interaction/order of views, functions, and generated columns

От
"Nunya Business"
Дата:
Good afternoon,

I've recently run into a weird issue that I'm trying to gather more data on before sending an official bug report on the off chance that it's already been addressed.

Within my schema there is a table that has a GENERATED ALWAYS column that calls a plpgsql function.  The called function has a "row type" variable declared that  references a view.  While the schema itself functions properly day to day, and pg_dumpall works as expected, the generated SQL fails to successfully execute.  The table in question is restored with no rows, and an error is generated during the COPY stating that the type does not exist.

The issue appears to be that the COPY statement for the data is trying to execute the function specified for the GENERATED ALWAYS column, and that function cannot run because the view that the function references does not yet exist.

The dump was made with: pg_dumpall -c --quote-all-identifiers --exclude-database=postgres --exclude-database=template0 --exclude-database=template1

Is this a known or unknown issue, or am I just missing something?

Any insight is appreciated.  Please reply-all as I'm not currently subscribed to the list.  Thanks in advance!
"Nunya Business" <nb3425586@gmail.com> writes:
> Within my schema there is a table that has a GENERATED ALWAYS column 
> that calls a plpgsql function.  The called function has a "row type" 
> variable declared that  references a view.  While the schema itself 
> functions properly day to day, and pg_dumpall works as expected, the 
> generated SQL fails to successfully execute.  The table in question is 
> restored with no rows, and an error is generated during the COPY stating 
> that the type does not exist.

Hmm, do you have actually circular dependencies in that?  pg_dump has
some heuristics for dealing with such cases, but maybe it needs more.
Please create a self-contained example and submit it to pgsql-bugs.

            regards, tom lane



On 12/5/22 11:49, Nunya Business wrote:
> Good afternoon,
> 
> I've recently run into a weird issue that I'm trying to gather more data 
> on before sending an official bug report on the off chance that it's 
> already been addressed.
> 
> Within my schema there is a table that has a GENERATED ALWAYS column 
> that calls a plpgsql function.  The called function has a "row type" 
> variable declared that  references a view.  While the schema itself 
> functions properly day to day, and pg_dumpall works as expected, the 
> generated SQL fails to successfully execute.  The table in question is 
> restored with no rows, and an error is generated during the COPY stating 
> that the type does not exist.
> 
> The issue appears to be that the COPY statement for the data is trying 
> to execute the function specified for the GENERATED ALWAYS column, and 
> that function cannot run because the view that the function references 
> does not yet exist.
> 
> The dump was made with: pg_dumpall -c --quote-all-identifiers 
> --exclude-database=postgres --exclude-database=template0 
> --exclude-database=template1
> 
> Is this a known or unknown issue, or am I just missing something?

https://www.postgresql.org/docs/current/sql-createtable.html

"GENERATED ALWAYS AS ( generation_expr ) STORED

     This clause creates the column as a generated column. The column 
cannot be written to, and when read the result of the specified 
expression will be returned.

     The keyword STORED is required to signify that the column will be 
computed on write and will be stored on disk.

     The generation expression can refer to other columns in the table, 
but not other generated columns. Any functions and operators used must 
be immutable. **References to other tables are not allowed.**
"

Emphasis(**) added.

I'm going to say hiding the table/view reference in a function is not 
going to work any better then when folks try that in a CHECK constraint.


> 
> Any insight is appreciated.  Please reply-all as I'm not currently 
> subscribed to the list.  Thanks in advance!

-- 
Adrian Klaver
adrian.klaver@aklaver.com




>"Nunya Business" <nb3425586@gmail.com> writes:
>>Within my schema there is a table that has a GENERATED ALWAYS column that calls a plpgsql function.  The called
functionhas a "row type" variable declared that  references a view.  While the schema itself functions properly day to
day,and pg_dumpall works as expected, the generated SQL fails to successfully execute.  The table in question is
restoredwith no rows, and an error is generated during the COPY stating that the type does not exist. 
>
>Hmm, do you have actually circular dependencies in that?  pg_dump has
>some heuristics for dealing with such cases, but maybe it needs more.
>Please create a self-contained example and submit it to pgsql-bugs.
>
>            regards, tom lane

Thanks Tom.  There are indeed circular references in the schema and the
whole thing sort of doesn't pass the smell test, but this is my first
look at it.  The generated column on the table calls a function which
selects from a view that references the table.  The production schema
where I ran into this is pretty large and complex, so the contrived
example that follows may not be the minimum working example but it's
pretty small and has the same behavior regarding the SQL generated by
pg_dumpall.

It seems that the schema is probably invalid according to the GENERATED
rules and that pg_dumpall is operating as intended, but somehow the
check in the ALTER TABLE isn't deep enough to prevent the issue, but
maybe I'm mistaken.  Once this is created, if you insert a few rows and
execute pg_dumpall, the resulting SQL cannot be loaded and will fail
during the COPY, complaining that the view referenced by the function
doesn't exist.

Here is the schema.  CCing pgsql-bugs as requested.

--------------------------CUT
CREATE TABLE tblA (
   id serial unique not null,
   dt timestamp with time zone not null default now(),
   data text
);

CREATE OR REPLACE VIEW viewA as (
   SELECT sum(id) FROM tblA
);

CREATE OR REPLACE FUNCTION fnA(askid INTEGER) RETURNS integer
   LANGUAGE 'plpgsql' IMMUTABLE
   AS $$

   declare
     varA viewA;
     ret integer;

   begin
     SELECT viewA.*
     INTO varA
     FROM viewA;

     ret = varA.sum;

     return ret;
   end;
$$;

ALTER TABLE tblA ADD COLUMN idsum INTEGER GENERATED ALWAYS AS (fnA(id))
STORED;






>"Nunya Business" <nb3425586@gmail.com> writes:
>>Within my schema there is a table that has a GENERATED ALWAYS column that calls a plpgsql function.  The called
functionhas a "row type" variable declared that  references a view.  While the schema itself functions properly day to
day,and pg_dumpall works as expected, the generated SQL fails to successfully execute.  The table in question is
restoredwith no rows, and an error is generated during the COPY stating that the type does not exist. 
>
>Hmm, do you have actually circular dependencies in that?  pg_dump has
>some heuristics for dealing with such cases, but maybe it needs more.
>Please create a self-contained example and submit it to pgsql-bugs.
>
>            regards, tom lane

Thanks Tom.  There are indeed circular references in the schema and the
whole thing sort of doesn't pass the smell test, but this is my first
look at it.  The generated column on the table calls a function which
selects from a view that references the table.  The production schema
where I ran into this is pretty large and complex, so the contrived
example that follows may not be the minimum working example but it's
pretty small and has the same behavior regarding the SQL generated by
pg_dumpall.

It seems that the schema is probably invalid according to the GENERATED
rules and that pg_dumpall is operating as intended, but somehow the
check in the ALTER TABLE isn't deep enough to prevent the issue, but
maybe I'm mistaken.  Once this is created, if you insert a few rows and
execute pg_dumpall, the resulting SQL cannot be loaded and will fail
during the COPY, complaining that the view referenced by the function
doesn't exist.

Here is the schema.  CCing pgsql-bugs as requested.

--------------------------CUT
CREATE TABLE tblA (
   id serial unique not null,
   dt timestamp with time zone not null default now(),
   data text
);

CREATE OR REPLACE VIEW viewA as (
   SELECT sum(id) FROM tblA
);

CREATE OR REPLACE FUNCTION fnA(askid INTEGER) RETURNS integer
   LANGUAGE 'plpgsql' IMMUTABLE
   AS $$

   declare
     varA viewA;
     ret integer;

   begin
     SELECT viewA.*
     INTO varA
     FROM viewA;

     ret = varA.sum;

     return ret;
   end;
$$;

ALTER TABLE tblA ADD COLUMN idsum INTEGER GENERATED ALWAYS AS (fnA(id))
STORED;






"Nunya Business" <nb3425586@gmail.com> writes:
> Thanks Tom.  There are indeed circular references in the schema and the 
> whole thing sort of doesn't pass the smell test, but this is my first 
> look at it.  The generated column on the table calls a function which 
> selects from a view that references the table.  The production schema 
> where I ran into this is pretty large and complex, so the contrived 
> example that follows may not be the minimum working example but it's 
> pretty small and has the same behavior regarding the SQL generated by 
> pg_dumpall.

Hm.  The actual problem here is that fnA() is making unwarranted
assumptions about the search_path it's run under, so it fails when the
pg_dump script invokes it with a restrictive search_path.  If you change
the function text so that the references to viewA are schema-qualified,
then it restores without errors.

"Doesn't pass the smell test" is putting it mildly, btw.  Labeling
that function IMMUTABLE is a huge lie, and what it means is that
your GENERATED column doesn't have the amount of stability that
it's supposed to per spec.  I'm not sure exactly what sorts of
misbehaviors might ensue from that, but I'm pretty certain that the
data in the GENERATED column after dump/restore won't match what
you had there beforehand.

            regards, tom lane



"Nunya Business" <nb3425586@gmail.com> writes:
> Thanks Tom.  There are indeed circular references in the schema and the 
> whole thing sort of doesn't pass the smell test, but this is my first 
> look at it.  The generated column on the table calls a function which 
> selects from a view that references the table.  The production schema 
> where I ran into this is pretty large and complex, so the contrived 
> example that follows may not be the minimum working example but it's 
> pretty small and has the same behavior regarding the SQL generated by 
> pg_dumpall.

Hm.  The actual problem here is that fnA() is making unwarranted
assumptions about the search_path it's run under, so it fails when the
pg_dump script invokes it with a restrictive search_path.  If you change
the function text so that the references to viewA are schema-qualified,
then it restores without errors.

"Doesn't pass the smell test" is putting it mildly, btw.  Labeling
that function IMMUTABLE is a huge lie, and what it means is that
your GENERATED column doesn't have the amount of stability that
it's supposed to per spec.  I'm not sure exactly what sorts of
misbehaviors might ensue from that, but I'm pretty certain that the
data in the GENERATED column after dump/restore won't match what
you had there beforehand.

            regards, tom lane



Tom,

Thanks for the insight, I didn't even consider the search path being an
issue and I should have.  I saw it explicitly specified in other parts
of the dump and just assumed it was being done in the function as well.
For example, the CREATE statements in the dump output all specify the
schema name even though it's not specified in the original statements.
I suppose expecting statements in the function body to be similarly
qualified in the dump is unreasonable considering the complexity of
doing so.

Anyway, thanks for the tip.  If I run into something like this again in
the future I'll have a better idea of where to look and how to recover
what data I can in an emergency.




------ Original Message ------
From "Tom Lane" <tgl@sss.pgh.pa.us>
To "Nunya Business" <nb3425586@gmail.com>
Cc pgsql-general@postgresql.org; pgsql-bugs@postgresql.org
Date 2022-12-06 15:47:26
Subject Re: Re[2]: PG 14.5 -- Impossible to restore dump due to
interaction/order of views, functions, and generated columns

>"Nunya Business" <nb3425586@gmail.com> writes:
>>Thanks Tom.  There are indeed circular references in the schema and the whole thing sort of doesn't pass the smell
test,but this is my first look at it.  The generated column on the table calls a function which selects from a view
thatreferences the table.  The production schema where I ran into this is pretty large and complex, so the contrived
examplethat follows may not be the minimum working example but it's pretty small and has the same behavior regarding
theSQL generated by pg_dumpall. 
>
>Hm.  The actual problem here is that fnA() is making unwarranted
>assumptions about the search_path it's run under, so it fails when the
>pg_dump script invokes it with a restrictive search_path.  If you change
>the function text so that the references to viewA are schema-qualified,
>then it restores without errors.
>
>"Doesn't pass the smell test" is putting it mildly, btw.  Labeling
>that function IMMUTABLE is a huge lie, and what it means is that
>your GENERATED column doesn't have the amount of stability that
>it's supposed to per spec.  I'm not sure exactly what sorts of
>misbehaviors might ensue from that, but I'm pretty certain that the
>data in the GENERATED column after dump/restore won't match what
>you had there beforehand.
>
>            regards, tom lane



Tom,

Thanks for the insight, I didn't even consider the search path being an
issue and I should have.  I saw it explicitly specified in other parts
of the dump and just assumed it was being done in the function as well.
For example, the CREATE statements in the dump output all specify the
schema name even though it's not specified in the original statements.
I suppose expecting statements in the function body to be similarly
qualified in the dump is unreasonable considering the complexity of
doing so.

Anyway, thanks for the tip.  If I run into something like this again in
the future I'll have a better idea of where to look and how to recover
what data I can in an emergency.




------ Original Message ------
From "Tom Lane" <tgl@sss.pgh.pa.us>
To "Nunya Business" <nb3425586@gmail.com>
Cc pgsql-general@postgresql.org; pgsql-bugs@postgresql.org
Date 2022-12-06 15:47:26
Subject Re: Re[2]: PG 14.5 -- Impossible to restore dump due to
interaction/order of views, functions, and generated columns

>"Nunya Business" <nb3425586@gmail.com> writes:
>>Thanks Tom.  There are indeed circular references in the schema and the whole thing sort of doesn't pass the smell
test,but this is my first look at it.  The generated column on the table calls a function which selects from a view
thatreferences the table.  The production schema where I ran into this is pretty large and complex, so the contrived
examplethat follows may not be the minimum working example but it's pretty small and has the same behavior regarding
theSQL generated by pg_dumpall. 
>
>Hm.  The actual problem here is that fnA() is making unwarranted
>assumptions about the search_path it's run under, so it fails when the
>pg_dump script invokes it with a restrictive search_path.  If you change
>the function text so that the references to viewA are schema-qualified,
>then it restores without errors.
>
>"Doesn't pass the smell test" is putting it mildly, btw.  Labeling
>that function IMMUTABLE is a huge lie, and what it means is that
>your GENERATED column doesn't have the amount of stability that
>it's supposed to per spec.  I'm not sure exactly what sorts of
>misbehaviors might ensue from that, but I'm pretty certain that the
>data in the GENERATED column after dump/restore won't match what
>you had there beforehand.
>
>            regards, tom lane