Re: BUG #17928: Standby fails to decode WAL on termination of primary

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #17928: Standby fails to decode WAL on termination of primary
Дата
Msg-id ZPKQA+HHqLVRFBSs@paquier.xyz
обсуждение исходный текст
Ответ на Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Sergei Kornilov <sk@zsrv.org>)
Список pgsql-bugs
On Mon, Aug 21, 2023 at 08:32:39AM +0900, Michael Paquier wrote:
> I am not sure that I'll be able to do more on this topic this week, at
> least that's some progress.

Some time later..

I have spent more time and thoughts on the whole test suite, finishing
with the attached 0003 that applies on top of your own patches.  I am
really looking forward to making this whole logic more robust, so as
WAL replay can be made itself more robust for the OOM/end-of-wal
detection on HEAD for standbys and crash recovery.

While looking at the whole, I have considered a few things that may
make the test cleaner, like:
- Calculating the segment name and its offset from the end_lsn of a
record directly from the backend, but it felt inelegant to pass
through more subroutine layers the couple ($segment, offset) rather
than just a LSN, so guessing the segment number and the offset while
the cluster is offline if OK by me.
- The TLI can be queried from the server rather than hardcoded.
- I've been thinking about bundling the tests of each sub-section in
their own subroutine, but that felt a bit awkward, particularly for
the part where we need a correct $prev_lsn in the record header
written to enforce other code paths.
- The test needs better documentation.  One of the things I kept
staring at is cross-checking pack() and the dependency to the C
structures, so I have added more details there, explaining more the
whys and the hows.

I have also looked again at the C code for a few hours, and still got
the impression that this is rather solid.  There are two things that
may be better:
- Document at the top of allocate_recordbuf() that this should never
be called with a length coming from a header until it is validated.
- Removing AllocSizeIsValid() for the non-FRONTEND path should be OK.

What do you think?
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon
Следующее
От: Sergei Kornilov
Дата:
Сообщение: Re: BUG #17928: Standby fails to decode WAL on termination of primary