Обсуждение: pg_upgrade allows itself to be run twice
Now that I've gotten your attention.. I expect pg_upgrade to fail if I run it twice in a row. It would be reasonable if it were to fail during the "--check" phase, maybe by failing like this: | New cluster database "..." is not empty: found relation "..." But that fails to happen if the cluster has neither tables nor matviews, in which case, it passes the check phase and then fails like this: Performing Upgrade ------------------ Analyzing all rows in the new cluster ok Freezing all rows in the new cluster ok Deleting files from new pg_xact ok Copying old pg_clog to new server ok Setting oldest XID for new cluster ok Setting next transaction ID and epoch for new cluster ok Deleting files from new pg_multixact/offsets ok Copying old pg_multixact/offsets to new server ok Deleting files from new pg_multixact/members ok Copying old pg_multixact/members to new server ok Setting next multixact ID and offset for new cluster ok Resetting WAL archives ok Setting frozenxid and minmxid counters in new cluster connection to server on socket "/home/pryzbyj/src/postgres/.s.PGSQL.50432"failed: FATAL: could not open relation with OID 2610 Failure, exiting I'll concede that a cluster which has no tables sounds a lot like a toy, but I find it disturbing that nothing prevents running the process twice, up to the point that it's evidently corrupted the catalog. While looking at this, I noticed that starting postgres --single immediately after initdb allows creating objects with OIDs below FirstNormalObjectId (thereby defeating pg_upgrade's check). I'm not familiar with the behavioral differences of single user mode, and couldn't find anything in the documentation.
On Sat, Jun 25, 2022 at 11:04:37AM -0500, Justin Pryzby wrote: > I expect pg_upgrade to fail if I run it twice in a row. Yep. > It would be reasonable if it were to fail during the "--check" phase, > maybe by failing like this: > | New cluster database "..." is not empty: found relation "..." So, we get a complaint that the new cluster is not empty after one pg_upgrade run with a new command of pg_upgrade, with or without --check. This happens in check_new_cluster(), where we'd fatal if a relation uses a namespace different than pg_catalog. > But that fails to happen if the cluster has neither tables nor matviews, in > which case, it passes the check phase and then fails like this: Indeed, as of get_rel_infos(). > I'll concede that a cluster which has no tables sounds a lot like a toy, but I > find it disturbing that nothing prevents running the process twice, up to the > point that it's evidently corrupted the catalog. I have heard of cases where instances were only used with a set of foreign tables, for example. Not sure that this is spread enough to worry about, but this would fail as much as your case. > While looking at this, I noticed that starting postgres --single immediately > after initdb allows creating objects with OIDs below FirstNormalObjectId > (thereby defeating pg_upgrade's check). I'm not familiar with the behavioral > differences of single user mode, and couldn't find anything in the > documentation. This one comes from NextOID in the control data file after a fresh initdb, and GetNewObjectId() would enforce that in a postmaster environment to be FirstNormalObjectId when assigning the first user OID. Would you imply an extra step at the end of initdb to update the control data file of the new cluster to reflect FirstNormalObjectId? -- Michael
Вложения
On Wed, Jun 29, 2022 at 01:17:33PM +0900, Michael Paquier wrote: > On Sat, Jun 25, 2022 at 11:04:37AM -0500, Justin Pryzby wrote: > > > I'll concede that a cluster which has no tables sounds a lot like a toy, but I > > find it disturbing that nothing prevents running the process twice, up to the > > point that it's evidently corrupted the catalog. > > I have heard of cases where instances were only used with a set of > foreign tables, for example. Not sure that this is spread enough to > worry about, but this would fail as much as your case. foreign tables have OIDs too. ts=# SELECT * FROM pg_class WHERE relkind ='f'; oid | 1554544611 > > While looking at this, I noticed that starting postgres --single immediately > > after initdb allows creating objects with OIDs below FirstNormalObjectId > > (thereby defeating pg_upgrade's check). I'm not familiar with the behavioral > > differences of single user mode, and couldn't find anything in the > > documentation. > > This one comes from NextOID in the control data file after a fresh > initdb, and GetNewObjectId() would enforce that in a postmaster > environment to be FirstNormalObjectId when assigning the first user > OID. Would you imply an extra step at the end of initdb to update the > control data file of the new cluster to reflect FirstNormalObjectId? I added a call to reset xlog, similar to what's in pg_upgrade. Unfortunately, I don't see an easy way to silence it. -- Justin
Вложения
rebased and updated Robert thought that it might be reasonable for someone to initdb, and then connect and make some modifications, and then pg_upgrade. https://www.postgresql.org/message-id/CA%2BTgmoYwaXh_wRRa2CqL4XpM4r6YEbq1%2Bec%3D%2B8b7Dr7-T_tT%2BQ%40mail.gmail.com But the DBs are dropped by pg_upgrade, so that seems to serve no purpose, except for shared relations (and global objects?). In the case of shared relations, it seems unsafe (even though my test didn't cause an immediate explosion). So rather than continuing to allow arbitrary changes between initdb and pg_upgrade, I propose to prohibit all changes. I'd consider allowing an "inclusive" list of allowable changes, if someone were to propose such a thing - but since DBs are dropped, I'm not sure what it could include.
Вложения
On 07.07.22 08:22, Justin Pryzby wrote: >> This one comes from NextOID in the control data file after a fresh >> initdb, and GetNewObjectId() would enforce that in a postmaster >> environment to be FirstNormalObjectId when assigning the first user >> OID. Would you imply an extra step at the end of initdb to update the >> control data file of the new cluster to reflect FirstNormalObjectId? > I added a call to reset xlog, similar to what's in pg_upgrade. > Unfortunately, I don't see an easy way to silence it. I think it would be better to update the control file directly instead of going through pg_resetwal. (See src/include/common/controldata_utils.h for the required functions.) However, I don't know whether we need to add special provisions that guard against people using postgres --single in complicated ways. Many consider the single-user mode deprecated outside of initdb use.
On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote: > On 07.07.22 08:22, Justin Pryzby wrote: > > > This one comes from NextOID in the control data file after a fresh > > > initdb, and GetNewObjectId() would enforce that in a postmaster > > > environment to be FirstNormalObjectId when assigning the first user > > > OID. Would you imply an extra step at the end of initdb to update the > > > control data file of the new cluster to reflect FirstNormalObjectId? > > I added a call to reset xlog, similar to what's in pg_upgrade. > > Unfortunately, I don't see an easy way to silence it. > > I think it would be better to update the control file directly instead of > going through pg_resetwal. (See src/include/common/controldata_utils.h for > the required functions.) > > However, I don't know whether we need to add special provisions that guard > against people using postgres --single in complicated ways. Many consider > the single-user mode deprecated outside of initdb use. Thanks for looking. One other thing I noticed (by accident!) is that pg_upgrade doesn't prevent itself from trying to upgrade a cluster on top of itself: | $ /usr/pgsql-15/bin/initdb -D pg15.dat -N | $ /usr/pgsql-15/bin/pg_upgrade -D pg15.dat -d pg15.dat -b /usr/pgsql-15/bin | Performing Upgrade | ------------------ | Analyzing all rows in the new cluster ok | Freezing all rows in the new cluster ok | Deleting files from new pg_xact ok ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | Copying old pg_xact to new server | *failure* | | Consult the last few lines of "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" for >> | command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log"2>&1 | cp: cannot stat 'pg15.dat/pg_xact': No such file or directory This may be of little concern since it's upgrading a version to itself, which only applies to developers. -- Justin
On 01.11.22 14:07, Justin Pryzby wrote: > On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote: >> On 07.07.22 08:22, Justin Pryzby wrote: >>>> This one comes from NextOID in the control data file after a fresh >>>> initdb, and GetNewObjectId() would enforce that in a postmaster >>>> environment to be FirstNormalObjectId when assigning the first user >>>> OID. Would you imply an extra step at the end of initdb to update the >>>> control data file of the new cluster to reflect FirstNormalObjectId? >>> I added a call to reset xlog, similar to what's in pg_upgrade. >>> Unfortunately, I don't see an easy way to silence it. >> >> I think it would be better to update the control file directly instead of >> going through pg_resetwal. (See src/include/common/controldata_utils.h for >> the required functions.) >> >> However, I don't know whether we need to add special provisions that guard >> against people using postgres --single in complicated ways. Many consider >> the single-user mode deprecated outside of initdb use. > > Thanks for looking. I think the above is a "returned with feedback" at this point. > One other thing I noticed (by accident!) is that pg_upgrade doesn't > prevent itself from trying to upgrade a cluster on top of itself: > > | $ /usr/pgsql-15/bin/initdb -D pg15.dat -N > | $ /usr/pgsql-15/bin/pg_upgrade -D pg15.dat -d pg15.dat -b /usr/pgsql-15/bin > | Performing Upgrade > | ------------------ > | Analyzing all rows in the new cluster ok > | Freezing all rows in the new cluster ok > | Deleting files from new pg_xact ok > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > | Copying old pg_xact to new server > | *failure* > | > | Consult the last few lines of "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" for >>> > | command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log"2>&1 > | cp: cannot stat 'pg15.dat/pg_xact': No such file or directory > > This may be of little concern since it's upgrading a version to itself, which > only applies to developers. I think this would be worth addressing nonetheless, for robustness. For comparison, "cp" and "mv" will error if you give source and destination that are the same file.
On Thu, Dec 01, 2022 at 10:30:16AM +0100, Peter Eisentraut wrote: > On 01.11.22 14:07, Justin Pryzby wrote: > > On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote: > > > On 07.07.22 08:22, Justin Pryzby wrote: > > > > > This one comes from NextOID in the control data file after a fresh > > > > > initdb, and GetNewObjectId() would enforce that in a postmaster > > > > > environment to be FirstNormalObjectId when assigning the first user > > > > > OID. Would you imply an extra step at the end of initdb to update the > > > > > control data file of the new cluster to reflect FirstNormalObjectId? > > > > I added a call to reset xlog, similar to what's in pg_upgrade. > > > > Unfortunately, I don't see an easy way to silence it. > > > > > > I think it would be better to update the control file directly instead of > > > going through pg_resetwal. (See src/include/common/controldata_utils.h for > > > the required functions.) > > > > > > However, I don't know whether we need to add special provisions that guard > > > against people using postgres --single in complicated ways. Many consider > > > the single-user mode deprecated outside of initdb use. > > > > Thanks for looking. To be clear, I don't think it's worth doing anything special just to avoid creating special OIDs if someone runs postgres --single immediately after initdb. However, setting FirstNormalOid in initdb itself (rather than in the next invocation of postgres, if it isn't in single-user-mode) was the mechanism by which to avoid the original problem that pg_upgrade can be run twice, if there are no non-system relations. That still seems desirable to fix somehow. > I think the above is a "returned with feedback" at this point. > > > One other thing I noticed (by accident!) is that pg_upgrade doesn't > > prevent itself from trying to upgrade a cluster on top of itself: > > | command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log"2>&1 > > | cp: cannot stat 'pg15.dat/pg_xact': No such file or directory > > > > This may be of little concern since it's upgrading a version to itself, which > > only applies to developers. > > I think this would be worth addressing nonetheless, for robustness. For > comparison, "cp" and "mv" will error if you give source and destination that > are the same file. I addressed this in 002. -- Justin
Вложения
On Fri, Dec 16, 2022 at 07:38:09AM -0600, Justin Pryzby wrote: > However, setting FirstNormalOid in initdb itself (rather than in the > next invocation of postgres, if it isn't in single-user-mode) was the > mechanism by which to avoid the original problem that pg_upgrade can be > run twice, if there are no non-system relations. That still seems > desirable to fix somehow. + if (new_cluster.controldata.chkpnt_nxtoid != FirstNormalObjectId) + pg_fatal("New cluster is not pristine: OIDs have been assigned since initdb (%u != %u)\n", + new_cluster.controldata.chkpnt_nxtoid, FirstNormalObjectId); On wraparound FirstNormalObjectId would be the first value we use for nextOid. Okay, that's very unlikely going to happen, still, strictly speaking, that could be incorrect. >> I think this would be worth addressing nonetheless, for robustness. For >> comparison, "cp" and "mv" will error if you give source and destination that >> are the same file. > > I addressed this in 002. + if (strcmp(make_absolute_path(old_cluster.pgdata), + make_absolute_path(new_cluster.pgdata)) == 0) + pg_fatal("cannot upgrade a cluster on top of itself"); Shouldn't this be done after adjust_data_dir(), which is the point where we'll know the actual data folders we are working on by querying postgres -C data_directory? -- Michael