all_visible replay aborting due to uninitialized pages

Поиск
Список
Период
Сортировка
От Andres Freund
Тема all_visible replay aborting due to uninitialized pages
Дата
Msg-id 20130528175802.GA26645@awork2.anarazel.de
обсуждение исходный текст
Ответы Re: all_visible replay aborting due to uninitialized pages  (Robert Haas <robertmhaas@gmail.com>)
Re: all_visible replay aborting due to uninitialized pages  (Noah Yetter <nyetter@gmail.com>)
Список pgsql-hackers
Hi,

A customer of ours reporting a standby loosing sync with the primary due
to the following error:
CONTEXT:  xlog redo visible: rel 1663/XXX/XXX; blk 173717
WARNING:  page 173717 of relation base/XXX/XXX is uninitialized
...
PANIC:  WAL contains references to invalid pages

Guessing around I looked and noticed the following problematic pattern:
1) A: wants to do an update, doesn't have enough freespace
2) A: extends the relation on the filesystem level (RelationGetBufferForTuple)
3) A: does PageInit (RelationGetBufferForTuple)
4) A: aborts, e.g. due to a serialization failure (heap_update)

At this point the page is initialized in memory, but not wal logged. It
isn't pinned or locked either.

5) B: vacuum finds that page and it's empty. So it marks it all
visible. But since the page wasn't written out (we haven't even marked
it dirty in 3.) the standby doesn't know that and reports the page as
being uninitialized.

ISTM the best backbranchable fix for this is to teach lazy_scan_heap to
log an FPI for the heap page via visibilitymap_set in that rather
limited case.

Happy to provide a patch unless somebody has a better idea?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: XLogInsert scaling, revisited
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_rewind, a tool for resynchronizing an old master after failover