hot standby startup, visibility map, clog

Поиск
Список
Период
Сортировка
От Daniel Farina
Тема hot standby startup, visibility map, clog
Дата
Msg-id BANLkTinCXfATbbPdXpz1OMW9A1Terg9hLQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: hot standby startup, visibility map, clog  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hello list,

A little while ago time ago I posted about how my ... exciting ....
backup procedure caused occasional problems starting due to clog not
being big enough.
(http://archives.postgresql.org/pgsql-hackers/2011-04/msg01148.php) I
recently had a reproduction and a little bit of luck, and I think I
have a slightly better idea of what may be causing this.

The first fact is that turning off hot standby will let the cluster
start up, but only after seeing a spate of messages like these (dozen
or dozens, not thousands):

2011-06-09 08:02:32 UTC  LOG:  restored log file
"000000020000002C000000C0" from archive
2011-06-09 08:02:33 UTC  WARNING:  xlog min recovery request
2C/C1F09658 is past current point 2C/C037B278
2011-06-09 08:02:33 UTC  CONTEXT:  writing block 0 of relation
base/16385/16784_vmxlog redo insert: rel 1663/16385/128029; tid 114321/63
2011-06-09 08:02:33 UTC  LOG:  restartpoint starting: xlog

Most importantly, *all* such messages are in visibility map forks
(_vm).  I reasonably confident that my code does not start reading
data until pg_start_backup() has returned, and blocks on
pg_stop_backup() after having read all the data.  Also, the mailing
list correspondence at
http://archives.postgresql.org/pgsql-hackers/2010-11/msg02034.php
suggests that the visibility map is not flushed at checkpoints, so
perhaps with some poor timing an old page can wander onto disk even
after a checkpoint barrier that pg_start_backup waits for. (I have not
yet found the critical section that makes visibilitymap buffers immune
to checkpoint though).

Given all that, if the smgr's generic read path that checks the LSN
and possibly the clog (but apparently only in hot standby mode, since
pre-hot-standby the clog's intermediate states were not so
interesting...) has a problem with such uncheckpointed pages, then it
would seem reasonable that the system refuses to start vs. the way it
once did.

FWIW, letting recovery run without hot standby for a little while,
canceling, and then starting again after the danger zone had passed
would allow recovery to proceed correctly, as one might expect.

Thoughts?

-- 
fdr


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Shigeru Hanada
Дата:
Сообщение: FOREIGN TABLE doc fix
Следующее
От: Kohei KaiGai
Дата:
Сообщение: [v9.2] sepgsql - userspace access vector cache (Re: [v9.1] sepgsql - userspace access vector cache)