Обсуждение: Re: pgsql: Remove the restriction that the relmap must be 512 bytes.
On 2022-Jul-26, Robert Haas wrote: > Remove the restriction that the relmap must be 512 bytes. > > Instead of relying on the ability to atomically overwrite the > entire relmap file in one shot, write a new one and durably > rename it into place. Removing the struct padding and the > calculation showing why the map is exactly 512 bytes, and change > the maximum number of entries to a nearby round number. Another thing that seems to have happened here is that catversion ought to have been touched and wasn't. Trying to start a cluster that was initdb'd with the previous code enters an infinite loop that dies each time with 2022-07-27 19:17:27.589 CEST [2516547] LOG: database system is ready to accept connections 2022-07-27 19:17:27.589 CEST [2516730] FATAL: could not read file "global/pg_filenode.map": read 512 of 524 2022-07-27 19:17:27.589 CEST [2516731] FATAL: could not read file "global/pg_filenode.map": read 512 of 524 2022-07-27 19:17:27.589 CEST [2516547] LOG: autovacuum launcher process (PID 2516730) exited with exit code 1 2022-07-27 19:17:27.589 CEST [2516547] LOG: terminating any other active server processes Perhaps we should still do a catversion bump now, since one hasn't happened since the commit. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente" (UPM, 1972)
On Wed, Jul 27, 2022 at 1:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Another thing that seems to have happened here is that catversion ought > to have been touched and wasn't. Trying to start a cluster that was > initdb'd with the previous code enters an infinite loop that dies each > time with > > 2022-07-27 19:17:27.589 CEST [2516547] LOG: database system is ready to accept connections > 2022-07-27 19:17:27.589 CEST [2516730] FATAL: could not read file "global/pg_filenode.map": read 512 of 524 > 2022-07-27 19:17:27.589 CEST [2516731] FATAL: could not read file "global/pg_filenode.map": read 512 of 524 > 2022-07-27 19:17:27.589 CEST [2516547] LOG: autovacuum launcher process (PID 2516730) exited with exit code 1 > 2022-07-27 19:17:27.589 CEST [2516547] LOG: terminating any other active server processes > > Perhaps we should still do a catversion bump now, since one hasn't > happened since the commit. Hmm, interesting. I didn't think about bumping catversion because I didn't change anything in the catalogs. I did think about changing the magic number for the file at one point, but unlike some of our other constants, there's no indication that this one is intended to be used as a version number. But in retrospect it would have been good to change something somewhere. If you want me to bump catversion now, I can. If you or someone else wants to do it, that's also fine. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jul 27, 2022 at 1:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> Another thing that seems to have happened here is that catversion ought >> to have been touched and wasn't. > Hmm, interesting. I didn't think about bumping catversion because I > didn't change anything in the catalogs. I did think about changing the > magic number for the file at one point, but unlike some of our other > constants, there's no indication that this one is intended to be used > as a version number. But in retrospect it would have been good to > change something somewhere. If you want me to bump catversion now, I > can. If you or someone else wants to do it, that's also fine. If there's a magic number, then I'd (a) change that and (b) adjust whatever comments led you to think you shouldn't. Bumping catversion is a good fallback choice when there's not any more-proximate version indicator, but here there is. regards, tom lane
On Wed, Jul 27, 2022 at 1:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > If there's a magic number, then I'd (a) change that and (b) adjust > whatever comments led you to think you shouldn't. Bumping catversion > is a good fallback choice when there's not any more-proximate version > indicator, but here there is. Maybe I just got cold feet because it doesn't ever have seem to have been changed before, because the definition says: #define RELMAPPER_FILEMAGIC 0x592717 /* version ID value */ And the fact that "version" is in there sure makes it seem like a version number, now that I look again. I'll add 1 to the value. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jul 27, 2022 at 2:13 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jul 27, 2022 at 1:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > If there's a magic number, then I'd (a) change that and (b) adjust > > whatever comments led you to think you shouldn't. Bumping catversion > > is a good fallback choice when there's not any more-proximate version > > indicator, but here there is. > > Maybe I just got cold feet because it doesn't ever have seem to have > been changed before, because the definition says: > > #define RELMAPPER_FILEMAGIC 0x592717 /* version ID value */ > > And the fact that "version" is in there sure makes it seem like a > version number, now that I look again. > > I'll add 1 to the value. Hmm, but that doesn't actually produce nice behavior. It just goes into an infinite loop, like this: 2022-07-27 14:21:12.826 EDT [32849] LOG: database system was interrupted; last known up at 2022-07-27 14:21:12 EDT 2022-07-27 14:21:12.860 EDT [32849] LOG: database system was not properly shut down; automatic recovery in progress 2022-07-27 14:21:12.861 EDT [32849] LOG: invalid record length at 0/14B3BB8: wanted 24, got 0 2022-07-27 14:21:12.861 EDT [32849] LOG: redo is not required 2022-07-27 14:21:12.864 EDT [32850] LOG: checkpoint starting: end-of-recovery immediate wait 2022-07-27 14:21:12.865 EDT [32850] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/14B3BB8, redo lsn=0/14B3BB8 2022-07-27 14:21:12.868 EDT [31930] LOG: database system is ready to accept connections 2022-07-27 14:21:12.869 EDT [32853] FATAL: relation mapping file "global/pg_filenode.map" contains invalid data 2022-07-27 14:21:12.869 EDT [32854] FATAL: relation mapping file "global/pg_filenode.map" contains invalid data 2022-07-27 14:21:12.870 EDT [31930] LOG: autovacuum launcher process (PID 32853) exited with exit code 1 2022-07-27 14:21:12.870 EDT [31930] LOG: terminating any other active server processes 2022-07-27 14:21:12.870 EDT [31930] LOG: background worker "logical replication launcher" (PID 32854) exited with exit code 1 2022-07-27 14:21:12.871 EDT [31930] LOG: all server processes terminated; reinitializing While I agree that changing a version identifier that is more closely related to what got changed is better than changing a generic one, I think people won't like an infinite loop that spews messages into the log at top speed as a way of telling them about the problem. So now I'm back to being unsure what to do here. -- Robert Haas EDB: http://www.enterprisedb.com
On 2022-Jul-27, Robert Haas wrote: > Hmm, but that doesn't actually produce nice behavior. It just goes > into an infinite loop, like this: > 2022-07-27 14:21:12.869 EDT [32853] FATAL: relation mapping file > "global/pg_filenode.map" contains invalid data This seems almost identical what happens without the version number change, so I wouldn't call it much of an improvement. > While I agree that changing a version identifier that is more closely > related to what got changed is better than changing a generic one, I > think people won't like an infinite loop that spews messages into the > log at top speed as a way of telling them about the problem. > > So now I'm back to being unsure what to do here. I vote to go for the catversion bump for now. Maybe it's possible to patch the relmapper code afterwards, so that if a version mismatch is detected the server is stopped with a reasonable message. An hypothetical improvement would be to keep the code to read the old version and upgrade the file automatically, but given the number of times that this file has changed, it's likely pointless effort. Therefore, my proposal is to add a comment next to the struct definition suggesting to bump catversion and call it a day. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> Hmm, but that doesn't actually produce nice behavior. It just goes >> into an infinite loop, like this: >> So now I'm back to being unsure what to do here. > I vote to go for the catversion bump for now. What this is showing us is that any form of corruption in the relmapper file causes very unpleasant behavior. We probably had better do something about that, independently of this issue. In the meantime, I still think bumping the file magic number is a better answer. It won't make the behavior any worse for un-updated code than it is already. regards, tom lane
On Wed, Jul 27, 2022 at 3:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > >> Hmm, but that doesn't actually produce nice behavior. It just goes > >> into an infinite loop, like this: > >> So now I'm back to being unsure what to do here. > > > I vote to go for the catversion bump for now. > > What this is showing us is that any form of corruption in the relmapper > file causes very unpleasant behavior. We probably had better do something > about that, independently of this issue. I'm not sure how important that is, but it certainly wouldn't hurt. > In the meantime, I still think bumping the file magic number is a better > answer. It won't make the behavior any worse for un-updated code than > it is already. But it also won't make it any better, so why even bother? The goal of catversion bumps is to replace crashes or unpredictable misbehavior with a nice error message that tells you exactly what the problem is. Here we'd just be replacing an infinite series of crashes with an infinite series of crashes with a slightly different error message. It's probably worth comparing those error messages: FATAL: could not read file "global/pg_filenode.map": read 512 of 524 FATAL: relation mapping file "global/pg_filenode.map" contains invalid data The first message is what you get now. The second message is what you get with the proposed change to the magic number. I would argue that the second message is actually worse than the first one, because the first one actually gives you some hint what the problem is, whereas the second one really just says that an unspecified bad thing happened. In short, I think Alvaro's idea is unprincipled but solves the problem of making it clear that a new initdb is required. Your idea is principled but does not solve any problem. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > In short, I think Alvaro's idea is unprincipled but solves the problem > of making it clear that a new initdb is required. Your idea is > principled but does not solve any problem. [ shrug... ] Fair enough. regards, tom lane