Re: [BUG] Archive recovery failure on 9.3+.

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [BUG] Archive recovery failure on 9.3+.
Дата
Msg-id 20140214.173857.65272356.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [BUG] Archive recovery failure on 9.3+.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: [BUG] Archive recovery failure on 9.3+.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
Hello,

Before taking up the topic..

At Thu, 13 Feb 2014 19:45:38 +0200, Heikki Linnakangas wrote
> On 02/13/2014 06:47 PM, Heikki Linnakangas wrote:
> > On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:
> >> The behavior where we prefer a segment from archive with lower TLI
> >> over
> >> a file with higher TLI in pg_xlog actually changed in commit
> >> a068c391ab0. Arguably changing it wasn't a good idea, but the problem
> >> your test script demonstrates can be fixed by not archiving the
> >> partial
> >> segment, with no change to the preference of archive/pg_xlog. As
> >> discussed, archiving a partial segment seems like a bad idea anyway,
> >> so
> >> let's just stop doing that.

It surely makes things simple and I rather like the idea but as
long as the final and possiblly partial segment of the lower TLI
is actually created and the recovery mechanism allows users to
command recovery operation requires such segments
(recovery_target_timeline does this), a "perfect archive" - which
means an archive which can cover all sorts of restore operatoins
- necessarily may have such duplicate segments, I
believe. Besides, I suppose that policy makes operations around
archive/restore way difficult. DBAs should get stuck with tensive
work of picking only actually needed segments for the recovery
undertaken out of the haystack. It sounds somewhat gloomy..

# However I also doubt the appropriateness of stockpiling archive
# segments spanning over so many timelines, two generations are
# enough to cause this issue.

Anyway, returning to the topic,

> > After some further thought, while not archiving the partial segment
> > fixes your test script, it's not enough to fix all variants of the
> > problem. Even if archive recovery doesn't archive the last, partial,
> > segment, if the original master server is still running, it's entirely
> > possible that it fills the segment and archives it. In that case,
> > archive recovery will again prefer the archived segment with lower TLI
> > over the segment with newer TLI in pg_xlog.

Yes, it is the generalized description of the case I've
mentioned. (Though I've not reached that thought :)

> > So I agree we should commit the patch you posted (or something to that
> > effect). The change to not archive the last segment still seems like a
> > good idea, but perhaps we should only do that in master.

My opinion on duplicate segments on older timelines is as
decribed above.

> To draw this to conclusion, barring any further insights to this, I'm
> going to commit the attached patch to master and REL9_3_STABLE. Please
> have a look at the patch, to see if I'm missing something. I modified
> the state machine to skip over XLOG_FROM_XLOG state, if reading in
> XLOG_FROM_ARCHIVE failed; otherwise you first scan the archive and
> pg_xlog together, and then pg_xlog alone, which is pointless.
> 
> In master, I'm also going to remove the "archive last segment on old
> timeline" code.

Thank you for finishing the patch. I didn't think of the behavior
after XLOG_FROM_ARCHIVE failure. It seems that the state machine
will go round getting rid of extra round with it. Recovery
process becomes able to grab the segment on highest (expected)
TLI among those with the same segment id regardless of their
locations. I think the recovery process will cope with "perfect"
archives described above for all types of recovery operation. The
state machine loop considering fallback from archive to pg_xlog
now seems somewhat too complicated than needed but it's also no
harm.

Though, here which was in my original patch,

>          readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
>                                        currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY : currentSource);

is sticking far out the line wrapping boundary and seems somewhat
dirty:(

And what the conditional operator seems to make the meaning of
the XLOG_FROM_ARCHIVE and _ANY a bit confused. But I failed to
unify them to any side so it is left as is..

Finally, the patch you will find attached is fixed only in
styling mentioned above from your last patch. This patch applies
current HEAD and I confirmed that it fixes this issue but I have
not checked the lastSourceFailed section. Simple file removal
could not lead to there.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 508970a..85a0ce9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11006,17 +11006,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,    /*-------     * Standby
modeis implemented by a state machine:     *
 
-     * 1. Read from archive (XLOG_FROM_ARCHIVE)
-     * 2. Read from pg_xlog (XLOG_FROM_PG_XLOG)
-     * 3. Check trigger file
-     * 4. Read from primary server via walreceiver (XLOG_FROM_STREAM)
-     * 5. Rescan timelines
-     * 6. Sleep 5 seconds, and loop back to 1.
+     * 1. Read from either archive or pg_xlog (XLOG_FROM_ARCHIVE), or just
+     *    pg_xlog (XLOG_FROM_XLOG)
+     * 2. Check trigger file
+     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+     * 4. Rescan timelines
+     * 5. Sleep 5 seconds, and loop back to 1.     *     * Failure to read from the current source advances the state
machineto
 
-     * the next state. In addition, successfully reading a file from pg_xlog
-     * moves the state machine from state 2 back to state 1 (we always prefer
-     * files in the archive over files in pg_xlog).
+     * the next state.     *     * 'currentSource' indicates the current state. There are no currentSource     *
valuesfor "check trigger", "rescan timelines", and "sleep" states,
 
@@ -11044,9 +11042,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,            switch
(currentSource)           {                case XLOG_FROM_ARCHIVE:
 
-                    currentSource = XLOG_FROM_PG_XLOG;
-                    break;
-                case XLOG_FROM_PG_XLOG:                    /*
@@ -11212,7 +11207,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,                 * Try to
restorethe file from archive, or read an existing                 * file from pg_xlog.                 */
 
-                readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource);
+                readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
+                        currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
+                                         currentSource);                if (readFile >= 0)                    return
true;   /* success! */ 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: HBA files w/include support?
Следующее
От: Hiroshi Inoue
Дата:
Сообщение: Re: narwhal and PGDLLIMPORT