Обсуждение: Invalid dump file after drop of role that previously created extension containing a table.
Hello, The bug is reproducible on PG16, but we have detected it during pg_upgrade 14 -> 15. Testcase is provided (Pg 14, 15, 16, Ubuntu 22.04). The upgrade failure and the test case description will follow: Check run was OK: ================= -bash-4.2$ /usr/pgsql-15/bin/pg_upgrade -d /pgsql/cluster/14/data/ -D /pgsql/cluster/15/data/ -b /usr/pgsql-14/bin -B /usr/pgsql-15/bin --link --check Performing Consistency Checks ----------------------------- Checking cluster versions ok Checking database user is the install user ok Checking database connection settings ok Checking for prepared transactions ok Checking for system-defined composite types in user tables ok Checking for reg* data types in user tables ok Checking for contrib/isn with bigint-passing mismatch ok Checking for presence of required libraries ok Checking database user is the install user ok Checking for prepared transactions ok Checking for new cluster tablespace directories ok *Clusters are compatible* The upgrade itself fails (--retain was used to allow the next analysis steps): ============================================================================== -bash-4.2$ /usr/pgsql-15/bin/pg_upgrade -d /pgsql/cluster/14/data/ -D /pgsql/cluster/15/data/ -b /usr/pgsql-14/bin -B /usr/pgsql-15/bin --link --retain Performing Consistency Checks ----------------------------- Checking cluster versions ok Checking database user is the install user ok Checking database connection settings ok Checking for prepared transactions ok Checking for system-defined composite types in user tables ok Checking for reg* data types in user tables ok Checking for contrib/isn with bigint-passing mismatch ok Creating dump of global objects ok Creating dump of database schemas ok Checking for presence of required libraries ok Checking database user is the install user ok Checking for prepared transactions ok Checking for new cluster tablespace directories ok If pg_upgrade fails after this point, you must re-initdb the new cluster before continuing. Performing Upgrade ------------------ Analyzing all rows in the new cluster ok Freezing all rows in the new cluster ok Deleting files from new pg_xact ok Copying old pg_xact to new server ok Setting oldest XID for new cluster ok Setting next transaction ID and epoch for new cluster ok Deleting files from new pg_multixact/offsets ok Copying old pg_multixact/offsets to new server ok Deleting files from new pg_multixact/members ok Copying old pg_multixact/members to new server ok Setting next multixact ID and offset for new cluster ok Resetting WAL archives ok Setting frozenxid and minmxid counters in new cluster ok Restoring global objects in the new cluster ok Restoring database schemas in the new cluster betsys *failure* Consult the last few lines of "/pgsql/cluster/15/data/pg_upgrade_output.d/20230921T191028.441/log/pg_upgrade_dump_16451.log" for the probable cause of the failure. Failure, exiting The error message in the log: ============================= pg_restore: creating ACL "cron.SEQUENCE "jobid_seq"" pg_restore: creating ACL "cron.TABLE "job"" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 6678; 0 0 ACL TABLE "job" sazky pg_restore: error: could not execute query: ERROR: role "16441" does not exist Command was: SELECT pg_catalog.binary_upgrade_set_record_init_privs(true); Get schema-only SQL script from the dump file whose import has failed during pg_upgrade: ======================================================================================== /usr/pgsql-15/bin/pg_restore -s -f /pgsql/cluster/15/data/pg_upgrade_output.d/20230921T191028.441/dump/pg_upgrade_dump_16451.schema_only.sql /pgsql/cluster/15/data/pg_upgrade_output.d/20230921T191028.441/dump/pg_upgrade_dump_16451.custom Check for the "16441" string in the SQL file produced from the dump created by pg_upgrade: ========================================================================================== -bash-4.2$ grep "16441" /pgsql/cluster/15/data/pg_upgrade_output.d/20230921T191028.441/dump/pg_upgrade_dump_16451.schema_only.sql GRANT ALL ON TABLE "cron"."job" TO "16441"; REVOKE ALL ON TABLE "cron"."job" FROM "16441"; Observation: ============ The pg_dump beyond other tables reads data from the pg_catalog.pg_init_privs table to construct the above-listed commands to preserve ACLs from the database being upgraded (actually, these are part of ordinary pg_dump anyway). In our case, I've realized that 16441 is an OID value for a previously dropped login role (database user). Testcase description: ==================== 1) An extension (I've used pg_cron as an example because it contains a table) is created by a database user (login role), and the initial privileges at extension creation are stored for the extension object (table in my test case) in the pg_catalog.pg_init_privs table. 2) Change the database user objects ownership from step 1 to another database user -> this step keeps the pg_catalog.pg_init_privs table content for the extension table from step 1 untouched. 3) Drop the database user used in step 1 and as its entry is deleted from the catalog, all that remains is the OID of the deleted database user in the pg_catalog.pg_init_privs table, later used by pg_dump. I was able to reproduce the issue on several PostgreSQL versions: ================================================================= PostgreSQL 16.0 (Ubuntu 16.0-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit PostgreSQL 15.4 (Ubuntu 15.4-2.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit PostgreSQL 14.9 (Ubuntu 14.9-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit Result from the test case: ========================== -------------------------------------------------------------------------------- Print DUMP differences before and after drop of role user1 -------------------------------------------------------------------------------- 80c80 < REVOKE ALL ON FUNCTION cron.alter_job(job_id bigint, schedule text, command text, database text, username text, active boolean) FROM user1; --- > REVOKE ALL ON FUNCTION cron.alter_job(job_id bigint, schedule text, command text, database text, username text, activeboolean) FROM "16384"; 88c88 < REVOKE ALL ON FUNCTION cron.schedule_in_database(job_name text, schedule text, command text, database text, username text, active boolean) FROM user1; --- > REVOKE ALL ON FUNCTION cron.schedule_in_database(job_name text, schedule text, command text, database text, username text,active boolean) FROM "16384"; 96c96 < REVOKE ALL ON SEQUENCE cron.jobid_seq FROM user1; --- > REVOKE ALL ON SEQUENCE cron.jobid_seq FROM "16384"; 106c106 < REVOKE ALL ON TABLE cron.job FROM user1; --- > REVOKE ALL ON TABLE cron.job FROM "16384"; 116c116 < REVOKE ALL ON SEQUENCE cron.runid_seq FROM user1; --- > REVOKE ALL ON SEQUENCE cron.runid_seq FROM "16384"; 126c126 < REVOKE ALL ON TABLE cron.job_run_details FROM user1; --- > REVOKE ALL ON TABLE cron.job_run_details FROM "16384"; The test case script and log files are attached. Kind regards Ales Zeleny
Вложения
Greetings, * Aleš Zelený (zeleny.ales@gmail.com) wrote: > Testcase description: > ==================== > 1) An extension (I've used pg_cron as an example because it contains a > table) is created by a database user (login role), and the initial > privileges at extension creation are stored for the extension object > (table in my test case) in the pg_catalog.pg_init_privs table. > 2) Change the database user objects ownership from step 1 to another > database user -> this step keeps the pg_catalog.pg_init_privs table > content for the extension table from step 1 untouched. > 3) Drop the database user used in step 1 and as its entry is deleted > from the catalog, all that remains is the OID of the deleted database > user in the pg_catalog.pg_init_privs table, later used by pg_dump. Hrmpf. Yeah, seems like if we're going to allow extensions and extension objects to be impacted by REASSIGN OWNED and such then we need to be sure to update pg_init_privs accordingly. At least that's my first thought seeing this. Would welcome thoughts from others on this though. Thanks, Stephen
Вложения
Hello,
my personal understanding of the pg_init_privs usage is to achieve some level of idempotency (some level -> it perfectly works at least if the role that created the extension is the same one that is importing the dump - other combinations are producing more or less different results).
my personal understanding of the pg_init_privs usage is to achieve some level of idempotency (some level -> it perfectly works at least if the role that created the extension is the same one that is importing the dump - other combinations are producing more or less different results).
So the pg_dump attempts to combine the past time state from pg_init_privs with the present state at the dump creation time.
Revoking all privileges from extension tables (and other objects) might solve it partially if all privileges for the extension objects are dumped and therefore applied on restore. Sadly, except (at least the following one) an issue - the dump file does not contain the extension version so if a newer extension version is created during restoration from a dump file there might be additional privileges defined by the new extension version and thus it is a bad idea to "revoke all" since they will be lost.
While it might take some time to get a sustainable solution, a non-ideal workaround I could imagine could be that the dump will check whether the role(s) stored in the pg_init_privs still exists at the time of dump creation and in case they are no longer the revoke statement will not be stored in the dump.
Kind regards Ales Zeleny
čt 21. 9. 2023 v 21:30 odesílatel Stephen Frost <sfrost@snowman.net> napsal:
Greetings,
* Aleš Zelený (zeleny.ales@gmail.com) wrote:
> Testcase description:
> ====================
> 1) An extension (I've used pg_cron as an example because it contains a
> table) is created by a database user (login role), and the initial
> privileges at extension creation are stored for the extension object
> (table in my test case) in the pg_catalog.pg_init_privs table.
> 2) Change the database user objects ownership from step 1 to another
> database user -> this step keeps the pg_catalog.pg_init_privs table
> content for the extension table from step 1 untouched.
> 3) Drop the database user used in step 1 and as its entry is deleted
> from the catalog, all that remains is the OID of the deleted database
> user in the pg_catalog.pg_init_privs table, later used by pg_dump.
Hrmpf. Yeah, seems like if we're going to allow extensions and
extension objects to be impacted by REASSIGN OWNED and such then we need
to be sure to update pg_init_privs accordingly. At least that's my
first thought seeing this. Would welcome thoughts from others on this
though.
Thanks,
Stephen
Hello,
during other upgrades, I've faced the same issue (with different extensions) on 3 out of 9 databases I've been upgrading (of course, it happened in the production environment).
So far, I've decided to add a check query to the database upgrade workflow:
select initprivs::text::aclitem[] from pg_init_privs;
It fails in case a dropped role OID is persisted in the pg_catalog.pg_init_privs on databases being upgraded, would it make sense to extend pg_upgrade check mode by a query like this (probably better one...)?
Kind regards Ales Zeleny
pá 22. 9. 2023 v 14:56 odesílatel Aleš Zelený <zeleny.ales@gmail.com> napsal:
Hello,
my personal understanding of the pg_init_privs usage is to achieve some level of idempotency (some level -> it perfectly works at least if the role that created the extension is the same one that is importing the dump - other combinations are producing more or less different results).So the pg_dump attempts to combine the past time state from pg_init_privs with the present state at the dump creation time.Revoking all privileges from extension tables (and other objects) might solve it partially if all privileges for the extension objects are dumped and therefore applied on restore. Sadly, except (at least the following one) an issue - the dump file does not contain the extension version so if a newer extension version is created during restoration from a dump file there might be additional privileges defined by the new extension version and thus it is a bad idea to "revoke all" since they will be lost.While it might take some time to get a sustainable solution, a non-ideal workaround I could imagine could be that the dump will check whether the role(s) stored in the pg_init_privs still exists at the time of dump creation and in case they are no longer the revoke statement will not be stored in the dump.Kind regards Ales Zelenyčt 21. 9. 2023 v 21:30 odesílatel Stephen Frost <sfrost@snowman.net> napsal:Greetings,
* Aleš Zelený (zeleny.ales@gmail.com) wrote:
> Testcase description:
> ====================
> 1) An extension (I've used pg_cron as an example because it contains a
> table) is created by a database user (login role), and the initial
> privileges at extension creation are stored for the extension object
> (table in my test case) in the pg_catalog.pg_init_privs table.
> 2) Change the database user objects ownership from step 1 to another
> database user -> this step keeps the pg_catalog.pg_init_privs table
> content for the extension table from step 1 untouched.
> 3) Drop the database user used in step 1 and as its entry is deleted
> from the catalog, all that remains is the OID of the deleted database
> user in the pg_catalog.pg_init_privs table, later used by pg_dump.
Hrmpf. Yeah, seems like if we're going to allow extensions and
extension objects to be impacted by REASSIGN OWNED and such then we need
to be sure to update pg_init_privs accordingly. At least that's my
first thought seeing this. Would welcome thoughts from others on this
though.
Thanks,
Stephen