Обсуждение: pg_rewind just doesn't fsync *anything*?
Hi, how come that the only comment in pg_rewind about fsyncing is ' void close_target_file(void) { .../* fsync? */ } Isn't that a bit, uh, minimal for a utility that's likely to be used in failover scenarios? I think we might actually be "saved" due to http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f33 because pg_rewind appears to leave the cluster in ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; updateControlFile(&ControlFile_new); a state that StartupXLOG will treat as needing recovery: if (ControlFile->state != DB_SHUTDOWNED && ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) SyncDataDirectory(); but that code went in after pg_rewind, so this certainly can't be an intentional save. I also don't think it's ok that you need to start the cluster to make it safe against a crash? I guess the easiest fix would be to shell out to initdb -s? Greetings, Andres Freund
On Thu, Mar 10, 2016 at 4:43 AM, Andres Freund <andres@anarazel.de> wrote: > how come that the only comment in pg_rewind about fsyncing is ' > void > close_target_file(void) > { > ... > /* fsync? */ > } > > Isn't that a bit, uh, minimal for a utility that's likely to be used in > failover scenarios? > > I think we might actually be "saved" due to > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f33 > because pg_rewind appears to leave the cluster in > > ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; > updateControlFile(&ControlFile_new); Yep, with minimum recovery target changed as well up to the LSN where pg_rewind began. > a state that StartupXLOG will treat as needing recovery: > > if (ControlFile->state != DB_SHUTDOWNED && > ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) > SyncDataDirectory(); > > but that code went in after pg_rewind, so this certainly can't be an > intentional save. > I also don't think it's ok that you need to start the cluster to make it > safe against a crash? No, that's not acceptable. One is not obliged to use the rewound data folder immediately, and my first intuition is that we had better guarantee that an entry in the file map gets consistent on disk immediately after being done operating on it. If we get a power loss after pg_rewind is done we may lose data silently. > I guess the easiest fix would be to shell out to initdb -s? What do you mean? I am not sure what initdb has to do with that as we have no need for it in pg_rewind. -- Michael
At 2016-03-10 08:35:43 +0100, michael.paquier@gmail.com wrote: > > > I guess the easiest fix would be to shell out to initdb -s? > > What do you mean? I am not sure what initdb has to do with that as we > have no need for it in pg_rewind. initdb -S/--sync-only fsyncs everything in the data directory and exits. -- Abhijit
On Thu, Mar 10, 2016 at 8:37 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > At 2016-03-10 08:35:43 +0100, michael.paquier@gmail.com wrote: >> >> > I guess the easiest fix would be to shell out to initdb -s? >> >> What do you mean? I am not sure what initdb has to do with that as we >> have no need for it in pg_rewind. > > initdb -S/--sync-only fsyncs everything in the data directory and exits. Missed your point, good to know that initdb is not doing anything else with -S than fsyncing everything in PGDATA. Still, I think that we had better fsync only entries that are modified by pg_rewind, and files that got updated, and not the whole directory, a target data folder should be stopped properly to be able to rewind, and it is better to avoid dependencies between utilities if that's not strictly necessary. initdb is likely to be installed side-by-side with pg_rewind in any distribution though. -- Michael
Hi, On 2016-03-10 08:47:16 +0100, Michael Paquier wrote: > Still, I think that we had better fsync only entries that are modified > by pg_rewind, and files that got updated, and not the whole directory Why? If any files in there are dirty, they need to be fsynced. If they're not dirty, fsync's free. > a target data folder should be stopped properly to be able to rewind, > and it is better to avoid dependencies between utilities if that's not > strictly necessary. initdb is likely to be installed side-by-side > with pg_rewind in any distribution though. It's not like we don't have any other such dependencies, in other binaries. I'm not concerned. Having to backpatch a single system() invocation + find_other_exec() call, and backporting a more general FRONTEND version of initdb's fsync_pgdata() are wildly differing in complexity. - Andres
On Thu, Mar 10, 2016 at 7:52 PM, Andres Freund <andres@anarazel.de> wrote: >> a target data folder should be stopped properly to be able to rewind, >> and it is better to avoid dependencies between utilities if that's not >> strictly necessary. initdb is likely to be installed side-by-side >> with pg_rewind in any distribution though. > > It's not like we don't have any other such dependencies, in other > binaries. I'm not concerned. > > Having to backpatch a single system() invocation + find_other_exec() > call, and backporting a more general FRONTEND version of initdb's > fsync_pgdata() are wildly differing in complexity. Talking about HEAD, wouldn't the dependency tree be cleaner if there is a common facility in src/common? For back-branches, I won't argue against simplicity, those are more reliable solutions. -- Michael
On 2016-03-10 20:31:55 +0100, Michael Paquier wrote: > On Thu, Mar 10, 2016 at 7:52 PM, Andres Freund <andres@anarazel.de> wrote: > > Having to backpatch a single system() invocation + find_other_exec() > > call, and backporting a more general FRONTEND version of initdb's > > fsync_pgdata() are wildly differing in complexity. > > Talking about HEAD, wouldn't the dependency tree be cleaner if there > is a common facility in src/common? Yea, probably; but lets sort out the backbranches first. Andres
On 2016-03-09 19:43:52 -0800, Andres Freund wrote: > Hi, > > how come that the only comment in pg_rewind about fsyncing is ' > void > close_target_file(void) > { > ... > /* fsync? */ > } > > Isn't that a bit, uh, minimal for a utility that's likely to be used in > failover scenarios? > > I think we might actually be "saved" due to > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f33 > because pg_rewind appears to leave the cluster in > > ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; > updateControlFile(&ControlFile_new); > > a state that StartupXLOG will treat as needing recovery: > > if (ControlFile->state != DB_SHUTDOWNED && > ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) > SyncDataDirectory(); > > but that code went in after pg_rewind, so this certainly can't be an > intentional save. > > I also don't think it's ok that you need to start the cluster to make it > safe against a crash? > > I guess the easiest fix would be to shell out to initdb -s? I've pushed a modified version of the fix that Michael posted in http://archives.postgresql.org/message-id/CAB7nPqRmM%2BCX6bVxw0Y7mMVGMFj1S8kwhevt8TaP83yeFRfbXA%40mail.gmail.com Andres
On Mon, Mar 28, 2016 at 6:52 AM, Andres Freund <andres@anarazel.de> wrote: > I've pushed a modified version of the fix that Michael posted in > http://archives.postgresql.org/message-id/CAB7nPqRmM%2BCX6bVxw0Y7mMVGMFj1S8kwhevt8TaP83yeFRfbXA%40mail.gmail.com Thanks. -- Michael