Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
От | Robert Haas |
---|---|
Тема | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows |
Дата | |
Msg-id | CA+TgmoZFjh3Ui9n77quMS278si_sWJNnRkLKDzzXDS-0v=SYFA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
(Thomas Munro <thomas.munro@gmail.com>)
|
Список | pgsql-bugs |
On Fri, Oct 6, 2023 at 12:50 AM Thomas Munro <thomas.munro@gmail.com> wrote: > If you trace the checkpointer's system calls you will see that > base/5/16384 (or whatever t's relfilenode is for you) is *not* fsync'd > by checkpoint #2. The following checkpoint #3 might eventually do it, > but if the kernel loses power after checkpoint #2 completes and there > is no checkpoint #3, the kernel might forget the truncation, and yet > replay starts too late to redo it. I think that bad sequence looks > like this: > > P1: log truncate > P2: choose redo LSN > P1: DropRelationBuffers() > P2: CheckPointBuffers() > P2: ProcessSyncRequests() > P1: ftruncate() > P1: RegisterSyncRequest() > P2: log checkpoint > *** system loses power *** > > I realise it is a different problem than the one reported, but it's > close. My initial thought is that perhaps we shouldn't allow a redo > LSN to be chosen until the sync request is registered, which is also > fairly close to the critical section boundaries being discussed for > ftruncate() error case. But that's not a phase the checkpoint delay > machinery currently knows how to delay. And there may well be better > ways... Hmm. So the problem in this case is that we not only need the WAL to be inserted and the filesystem operation to be performed in the same checkpoint cycle, but we also need the sync request to be processed in that checkpoint cycle. In this example, the WAL insertion happens before the redo LSN, so the truncate on disk must happen before the checkpoint completes, which is guaranteed, and the sync request must also be processed before the checkpoint completes, which is not guaranteed. We only guarantee that the checkpoint goes into the checkpointer's queue before the checkpoint completes, not that it gets processed before the checkpoint completes. Suppose that RelationTruncate set both DELAY_CHKPT_START and DELAY_CHKPT_COMPLETE. I think that would prevent this problem. P2 could still choose the redo LSN after P1 logged the truncate, but it wouldn't then be able to reach CheckPointBuffers() until after P1 had reached RegisterSyncRequest. Note that setting *only* DELAY_CHKPT_START isn't good enough, because then we can get this history: P1: log truncate P2: choose redo LSN P2: CheckPointBuffers() P1: DropRelationBuffers() P2: ProcessSyncRequests() P2: log checkpoint *** system loses power *** I think, in general, that DELAY_CHKPT_START is useful for cases where CheckPointGuts() is going to flush something that might not be locked at the time we WAL-log a modification to it. For instance, MarkBufferDirtyHint() wants to log that the buffer will be dirty while holding only a shared lock on the buffer. RecordTransactonCommit() and RecordTransactionCommitPrepared() write WAL that necesitate CLOG updates, but they haven't yet locked the relevant CLOG buffers. DELAY_CHKPT_END, on the other hand, is useful for cases where some operation needs to get completed before the redo pointer gets moved. What I think we missed in 412ad7a55639516f284cd0ef9757d6ae5c7abd43 is that RelationTruncate actually falls into both categories -- the ftruncate() needs to be protected by DELAY_CHKPT_END so that it happens before the redo pointer moves, but the RegisterSyncRequest() needs to be protected by DELAY_CHKPT_START so that it gets "flushed" by ProcessSyncRequests. I'm not quite sure that this idea closes all the holes, though. The delay-checkpoint mechanism is quite hard to reason about. I spent a while this morning trying to think up something better but realized at the end I'd just reinvented the wheel. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Robert HaasДата:
Сообщение: Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Следующее
От: Thomas MunroДата:
Сообщение: Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows