Re: Infinite loop in XLogPageRead() on standby

Поиск
Список
Период
Сортировка
От Alexander Kukushkin
Тема Re: Infinite loop in XLogPageRead() on standby
Дата
Msg-id CAFh8B=nPSERv7NyYHmjVXK4xK3va1XzU3-rhOswjgEZMWkV=RQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Infinite loop in XLogPageRead() on standby  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi Michael,

On Thu, 29 Feb 2024 at 06:05, Michael Paquier <michael@paquier.xyz> wrote:

Wow.  Have you seen that in an actual production environment?

Yes, we see it regularly, and it is reproducible in test environments as well.
 
my $start_page = start_of_page($end_lsn);
my $wal_file = write_wal($primary, $TLI, $start_page,
                         "\x00" x $WAL_BLOCK_SIZE);
# copy the file we just "hacked" to the archive
copy($wal_file, $primary->archive_dir);

So you are emulating a failure by filling with zeros the second page
where the last emit_message() generated a record, and the page before
that includes the continuation record.  Then abuse of WAL archiving to
force the replay of the last record.  That's kind of cool.

Right, at this point it is easier than to cause an artificial crash on the primary after it finished writing just one page.
 
> To be honest, I don't know yet how to fix it nicely. I am thinking about
> returning XLREAD_FAIL from XLogPageRead() if it suddenly switched to a new
> timeline while trying to read a page and if this page is invalid.

Hmm.  I suspect that you may be right on a TLI change when reading a
page.  There are a bunch of side cases with continuation records and
header validation around XLogReaderValidatePageHeader().  Perhaps you
have an idea of patch to show your point?

Not yet, but hopefully I will get something done next week.
 

Nit.  In your test, it seems to me that you should not call directly
set_standby_mode and enable_restoring, just rely on has_restoring with
the standby option included.

Thanks, I'll look into it. 

--
Regards,
--
Alexander Kukushkin

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Atomic ops for unlogged LSN
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: Supporting MERGE on updatable views