On Fri, 2 Feb 2024 11:18:18 +0100
Thomas Munro <thomas.munro@gmail.com> wrote:
> Hi,
>
> New WAL space is created by renaming a file into place. Either a
> newly created file with a temporary name or, ideally, a recyclable old
> file with a name derived from an old LSN. I think there is a data
> loss window between rename() and fsync(parent_directory). A
> concurrent backend might open(new_name), write(), fdatasync(), and
> then we might lose power before the rename hits the disk. The data
> itself would survive the crash, but recovery wouldn't be able to find
> and replay it. That might break the log-before-data rule or forget a
> transaction that has been reported as committed to a client.
>
> Actual breakage would presumably require really bad luck, and I
> haven't seen this happen or anything, it just occurred to me while
> reading code, and I can't see any existing defences.
>
> One simple way to address that would be to make XLogFileInitInternal()
> wait for InstallXLogFileSegment() to finish. It's a little
Or, can we make sure the rename is durable by calling fsync before
returning the fd, as a patch attached here?
Regards,
Yugo Nagata
> pessimistic to do that unconditionally, though, as then you have to
> wait even for rename operations for segment files later than the one
> you're opening, so I thought about how to skip waiting in that case --
> see 0002. I'm not sure if it's worth worrying about or not.
--
Yugo NAGATA <nagata@sraoss.co.jp>