Re: BUG #4879: bgwriter fails to fsync the file in recovery mode

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
Дата
Msg-id 4A44A9A3.1060902@enterprisedb.com
обсуждение исходный текст
Ответ на Re: BUG #4879: bgwriter fails to fsync the file in recovery mode  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: BUG #4879: bgwriter fails to fsync the file in recovery mode  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #4879: bgwriter fails to fsync the file in recovery mode  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Simon Riggs wrote:
> On Fri, 2009-06-26 at 05:14 +0100, Simon Riggs wrote:
>> On Thu, 2009-06-25 at 20:25 -0400, Tom Lane wrote:
>
>>> What I am thinking is that instead of the hack of clearing
>>> LocalRecoveryInProgress to allow the current process to write WAL,
>>> we should have a separate test function WALWriteAllowed() with a
>>> state variable LocalWALWriteAllowed, and the hack should set that
>>> state without playing any games with LocalRecoveryInProgress.  Then
>>> RecoveryInProgress() remains true during the end-of-recovery checkpoint
>>> and we can condition the TruncateMultiXact and TruncateSUBTRANS calls
>>> on that.  Meanwhile the various places that check RecoveryInProgress
>>> to decide if WAL writing is allowed should call WALWriteAllowed()
>>> instead.
>> No need.
>
> Belay that. Yes, agree need for additional state, though think its more
> like EndRecoveryIsComplete().

Here's a patch implementing the WALWriteAllowed() idea (I'm not wedded
to the name). There's two things that trouble me with this patch:

- CreateCheckPoint() calls AdvanceXLInsertBuffer() before setting
LocalWALWriteAllowed. I don't see anything in AdvanceXLInsertBuffer()
that would fail, but it doesn't feel right. While strictly speaking it
doesn't insert new WAL records, it does write WAL page headers.

- As noted with an XXX comment in the patch, CreateCheckPoint() now
resets LocalWALWriteAllowed to false after a shutdown/end-of-recovery
checkpoint. But that's not enough to stop WAL inserts after a shutdown
checkpoint, because when RecoveryInProgress() is false, we
WALWriteAllowed() still returns true. We haven't had such a safeguard in
place before, so we can keep living without it, but now that we have a
WALWriteAllowed() macro it would be nice if it returned false when WAL
writes are no longer allowed after a shutdown checkpoint. (that would've
caught a bug in Guillaume Smet's original patch to rotate a WAL segment
at shutdown, where the xlog switch was done after shutdown checkpoint)

On whole, this is probably the right way going forward, but I'm not sure
if it'd make 8.4 more or less robust than what's in CVS now.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7314341..6f86961 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1543,7 +1543,7 @@ CheckPointMultiXact(void)
      * SimpleLruTruncate would get confused.  It seems best not to risk
      * removing any data during recovery anyway, so don't truncate.
      */
-    if (!InRecovery)
+    if (!RecoveryInProgress())
         TruncateMultiXact();

     TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d6f63c7..f28bd4d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -142,6 +142,16 @@ static bool InArchiveRecovery = false;
  */
 static bool LocalRecoveryInProgress = true;

+/*
+ * Am I allowed to write new WAL records? It's always allowed after recovery.
+ * End-of-recovery checkpoint sets LocalWALWriteAllowed before we exit
+ * recovery, so that it can write the checkpoint record even though
+ * RecoveryInProgress() is still true.
+ */
+#define WALWriteAllowed() (LocalWALWriteAllowed || !RecoveryInProgress())
+
+static bool LocalWALWriteAllowed = false;
+
 /* Was the last xlog file restored from archive, or local? */
 static bool restoredFromArchive = false;

@@ -537,7 +547,7 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
     bool        isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH);

     /* cross-check on whether we should be here or not */
-    if (RecoveryInProgress())
+    if (!WALWriteAllowed())
         elog(FATAL, "cannot make new WAL entries during recovery");

     /* info's high bits are reserved for use by me */
@@ -1846,7 +1856,7 @@ XLogFlush(XLogRecPtr record)
      * During REDO, we don't try to flush the WAL, but update minRecoveryPoint
      * instead.
      */
-    if (RecoveryInProgress())
+    if (!WALWriteAllowed())
     {
         UpdateMinRecoveryPoint(record, false);
         return;
@@ -1949,7 +1959,7 @@ XLogFlush(XLogRecPtr record)
      * and so we will not force a restart for a bad LSN on a data page.
      */
     if (XLByteLT(LogwrtResult.Flush, record))
-        elog(InRecovery ? WARNING : ERROR,
+        elog(RecoveryInProgress() ? WARNING : ERROR,
         "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
              record.xlogid, record.xrecoff,
              LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
@@ -1977,7 +1987,7 @@ XLogBackgroundFlush(void)
     bool        flexible = true;

     /* XLOG doesn't need flushing during recovery */
-    if (RecoveryInProgress())
+    if (!WALWriteAllowed())
         return;

     /* read LogwrtResult and update local state */
@@ -5849,7 +5859,10 @@ RecoveryInProgress(void)
          * recovery is finished.
          */
         if (!LocalRecoveryInProgress)
+        {
             InitXLOGAccess();
+            LocalWALWriteAllowed = true;
+        }

         return LocalRecoveryInProgress;
     }
@@ -6225,7 +6238,6 @@ CreateCheckPoint(int flags)
     uint32        _logSeg;
     TransactionId *inCommitXids;
     int            nInCommit;
-    bool        OldInRecovery = InRecovery;

     /*
      * An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -6236,33 +6248,9 @@ CreateCheckPoint(int flags)
     else
         shutdown = false;

-    /*
-     * A startup checkpoint is created before anyone else is allowed to
-     * write WAL. To allow us to write the checkpoint record, set
-     * LocalRecoveryInProgress to false. This lets us write WAL, but others
-     * are still not allowed to do so.
-     */
-    if (flags & CHECKPOINT_END_OF_RECOVERY)
-    {
-        Assert(RecoveryInProgress());
-        LocalRecoveryInProgress = false;
-        InitXLOGAccess();
-
-        /*
-         * Before 8.4, end-of-recovery checkpoints were always performed by
-         * the startup process, and InRecovery was set true. InRecovery is not
-         * normally set in bgwriter, but we set it here temporarily to avoid
-         * confusing old code in the end-of-recovery checkpoint code path that
-         * rely on it.
-         */
-        InRecovery = true;
-    }
-    else
-    {
-        /* shouldn't happen */
-        if (RecoveryInProgress())
-            elog(ERROR, "can't create a checkpoint during recovery");
-    }
+    /* shouldn't happen */
+    if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
+        elog(ERROR, "can't create a checkpoint during recovery");

     /*
      * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
@@ -6305,7 +6293,6 @@ CreateCheckPoint(int flags)

     /* Begin filling in the checkpoint WAL record */
     MemSet(&checkPoint, 0, sizeof(checkPoint));
-    checkPoint.ThisTimeLineID = ThisTimeLineID;
     checkPoint.time = (pg_time_t) time(NULL);

     /*
@@ -6473,6 +6460,24 @@ CreateCheckPoint(int flags)
     START_CRIT_SECTION();

     /*
+     * An end-of-recovery checkpoint is created before anyone is allowed to
+     * write WAL. To allow us to write the checkpoint record, temporarily
+     * enable LocalWALWriteAllowed.
+     */
+    if (flags & CHECKPOINT_END_OF_RECOVERY)
+    {
+        Assert(RecoveryInProgress());
+        LocalWALWriteAllowed = true;
+        InitXLOGAccess();
+    }
+
+    /*
+     * this needs to be done after the InitXLOGAccess() call above or
+     * ThisTimeLineID might be uninitialized
+     */
+    checkPoint.ThisTimeLineID = ThisTimeLineID;
+
+    /*
      * Now insert the checkpoint record into XLOG.
      */
     rdata.data = (char *) (&checkPoint);
@@ -6488,6 +6493,19 @@ CreateCheckPoint(int flags)
     XLogFlush(recptr);

     /*
+     * We mustn't write any new WAL after a shutdown checkpoint, or it will
+     * be overwritten at next startup. No-one should even try, this just
+     * allows a bit more sanity-checking. XXX: since RecoveryInProgress() is
+     * false at that point, we'll just set LocalWriteAllowed again if anyone
+     * calls XLogInsert(), so this is actually useless in the shutdown case.
+     *
+     * Don't allow WAL writes after an end-of-recovery checkpoint either. It
+     * will be enabled again after the startup is fully completed.
+     */
+    if (shutdown)
+        LocalWALWriteAllowed = false;
+
+    /*
      * We now have ProcLastRecPtr = start of actual checkpoint record, recptr
      * = end of actual checkpoint record.
      */
@@ -6560,7 +6578,7 @@ CreateCheckPoint(int flags)
      * in subtrans.c).    During recovery, though, we mustn't do this because
      * StartupSUBTRANS hasn't been called yet.
      */
-    if (!InRecovery)
+    if (!RecoveryInProgress())
         TruncateSUBTRANS(GetOldestXmin(true, false));

     /* All real work is done, but log before releasing lock. */
@@ -6574,9 +6592,6 @@ CreateCheckPoint(int flags)
                                      CheckpointStats.ckpt_segs_recycled);

     LWLockRelease(CheckpointLock);
-
-    /* Restore old value */
-    InRecovery = OldInRecovery;
 }

 /*

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
Следующее
От: "Justin"
Дата:
Сообщение: BUG #4886: Password Crash