At Mon, 23 Jul 2018 15:59:16 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180723065916.GI2854@paquier.xyz>
> On Mon, Jul 23, 2018 at 01:57:48PM +0900, Kyotaro HORIGUCHI wrote:
> > I'll register this to the next commitfest.
> >
> > https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyotaro@lab.ntt.co.jp
>
> This is an open item for v11.
Mmm. Thanks. I wrongly thought this was v10 item. Removed this
from the next CF.
Thank you for taking a look.
> >> While considering this, I found a bug in 4b0d28de06, which
> >> removed prior checkpoint from control file. It actually trims the
> >> segments before the last checkpoint's redo segment but recycling
> >> is still considered based on the *prevous* checkpoint. As the
> >> result min_wal_size doesn't work as told. Specifically, setting
> >> min/max_wal_size to 48MB and advance four or more segments then
> >> two checkpoints leaves just one segment, which is less than
> >> min_wal_size.
> >>
> >> The attached patch fixes that. One arguable point on this would
> >> be the removal of the behavior when RemoveXLogFile(name,
> >> InvalidXLogRecPtr, ..).
> >>
> >> The only place calling the function with the parameter is
> >> timeline switching. Previously unconditionally 10 segments are
> >> recycled after switchpoint but the reason for the behavior is we
> >> didn't have the information on previous checkpoint at hand at the
> >> time. But now we can use the timeline switch point as the
> >> approximate of the last checkpoint's redo point and this allows
> >> us to use min/max_wal_size properly at the time.
>
> I have been looking at that, and tested with this simple scenario:
> create table aa (a int);
>
> Then just repeat the following SQLs:
> insert into aa values (1);
> select pg_switch_wal();
> checkpoint;
> select pg_walfile_name(pg_current_wal_lsn());
>
> By doing so, you would notice that the oldest WAL segment does not get
> recycled after the checkpoint, while it should as the redo pointer is
> always checkpoint generated. What happens is that this oldest segment
> gets recycled every two checkpoints.
(I'm not sure I'm getting it correctly..) I see the old segments
recycled. When I see pg_switch_wal() returns 0/12..,
pg_walfile_name is ...13 and segment files for 13 and 14 are
found in pg_wal directory. That is, seg 12 was recycled as seg
14. log_checkpoint=on shows every checkpoint recycles 1 segment
in the case.
> Then I had a look at the proposed patch, with a couple of comments.
>
> - if (PriorRedoPtr == InvalidXLogRecPtr)
> - recycleSegNo = endlogSegNo + 10;
> - else
> - recycleSegNo = XLOGfileslop(PriorRedoPtr);
> + recycleSegNo = XLOGfileslop(RedoRecPtr);
> I think that this is a new behavior, and should not be changed, see
> point 3 below.
>
> In CreateCheckPoint(), the removal of past WAL segments is always going
> to happen as RedoRecPtr is never InvalidXLogRecPtr, so the recycling
> should happen also when PriorRedoPtr is InvalidXLogRecPtr, no?
Yes. The reason for the change was the change of
RemoveNonParentXlogFiles that I made and it is irrelevant to this
bug fix (and rather breaking as you pointed..). So, reverted it.
> /* Trim from the last checkpoint, not the last - 1 */
> This comment could be adjusted, let's remove "not the last - 1" .
Oops! Thanks. The comment has finally vanished and melded into
another comment just above.
| * Delete old log files not required by the last checkpoint and recycle
| * them
> The calculation of _logSegNo in CreateRestartPoint is visibly incorrect,
> this still uses PriorRedoPtr so the bug is not fixed for standbys. The
> tweaks for ThisTimeLineID also need to be out of the loop where
> PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation
> should be kept.
Agreed. While I reconsidered this, I noticed that the estimated
checkpoint distance is 0 when PriorRedoPtr is invalid. This means
that the first checkpoint/restartpoint tries to reduce WAL
segments to min_wal_size. However, it happens only at initdb time
and makes no substantial behavior change so I made the change
ignoring the difference.
> Finally, in summary, this patch is doing actually three things:
> 1) Rename a couple of variables which refer incorrectly to the prior
> checkpoint so as they refer to the last checkpoint generated.
> 2) Make sure that WAL recycling/removal happens based on the last
> checkpoint generated, which is the one just generated when past WAL
> segments are cleaned up as post-operation actions.
> 3) Enforce the recycling point to be the switch point instead of
> arbitrarily recycle 10 segments, which is what b2a5545b has introduced.
> Recycling at exactly the switch point is wrong, as the redo LSN of the
> previous checkpoint may not be at the same segment when a timeline has
> switched, so you would finish with recycling segments which are still
> needed if an instance needs be crash-recovered post-promotion without
> a completed post-recovery checkpoint. In short this is dangerous.
> I'll let Heikki comment on that, but I think that you should fetch the
> last redo LSN instead, still you need to be worried about checkpoints
> running in parallel of the startup process, so I would stick with the
> current logic.
Thank you for the detail. I was coufused a bit there. I agree to
preserve the logic, too.
> 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is
> not.
Thank you for the comments and suggestions. After all, I did the
following things in the attached patch.
- Reverted the change on timeline switching. (Removed the (3))
- Fixed CreateRestartPoint to do the same thing with CreateCheckPoint.
- Both CreateRestart/CheckPoint now tries trimming of WAL
segments even for the first time.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From 662ecab9c2d0efc41756d98e18dbabe060da8d96 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 19 Jul 2018 12:13:56 +0900
Subject: [PATCH] Fix calculation base of WAL recycling
The commit 4b0d28de06 removed the prior checkpoint and related things
but that leaves WAL recycling based on the prior checkpoint. This
makes max_wal_size and min_wal_size work incorrectly. This patch makes
WAL recycling be based on the last checkpoint.
---
src/backend/access/transam/xlog.c | 159 ++++++++++++++++++--------------------
1 file changed, 75 insertions(+), 84 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 335b4a573d..05f750253f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2287,7 +2287,7 @@ assign_checkpoint_completion_target(double newval, void *extra)
* XLOG segments? Returns the highest segment that should be preallocated.
*/
static XLogSegNo
-XLOGfileslop(XLogRecPtr PriorRedoPtr)
+XLOGfileslop(XLogRecPtr RedoRecPtr)
{
XLogSegNo minSegNo;
XLogSegNo maxSegNo;
@@ -2299,9 +2299,9 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
* correspond to. Always recycle enough segments to meet the minimum, and
* remove enough segments to stay below the maximum.
*/
- minSegNo = PriorRedoPtr / wal_segment_size +
+ minSegNo = RedoRecPtr / wal_segment_size +
ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
- maxSegNo = PriorRedoPtr / wal_segment_size +
+ maxSegNo = RedoRecPtr / wal_segment_size +
ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
/*
@@ -2316,7 +2316,7 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
/* add 10% for good measure. */
distance *= 1.10;
- recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) /
+ recycleSegNo = (XLogSegNo) ceil(((double) RedoRecPtr + distance) /
wal_segment_size);
if (recycleSegNo < minSegNo)
@@ -3899,12 +3899,12 @@ RemoveTempXlogFiles(void)
/*
* Recycle or remove all log files older or equal to passed segno.
*
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
* whether we want to recycle rather than delete no-longer-wanted log files.
*/
static void
-RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
{
DIR *xldir;
struct dirent *xlde;
@@ -3947,7 +3947,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
/* Update the last removed location in shared memory first */
UpdateLastRemovedPtr(xlde->d_name);
- RemoveXlogFile(xlde->d_name, PriorRedoPtr, endptr);
+ RemoveXlogFile(xlde->d_name, RedoRecPtr, endptr);
}
}
}
@@ -4021,14 +4021,14 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
/*
* Recycle or remove a log file that's no longer needed.
*
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
* whether we want to recycle rather than delete no-longer-wanted log files.
- * If PriorRedoRecPtr is not known, pass invalid, and the function will
- * recycle, somewhat arbitrarily, 10 future segments.
+ * If RedoRecPtr is not known, pass invalid, and the function will recycle,
+ * somewhat arbitrarily, 10 future segments.
*/
static void
-RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
{
char path[MAXPGPATH];
#ifdef WIN32
@@ -4042,10 +4042,10 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
* Initialize info about where to try to recycle to.
*/
XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
- if (PriorRedoPtr == InvalidXLogRecPtr)
+ if (RedoRecPtr == InvalidXLogRecPtr)
recycleSegNo = endlogSegNo + 10;
else
- recycleSegNo = XLOGfileslop(PriorRedoPtr);
+ recycleSegNo = XLOGfileslop(RedoRecPtr);
snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
@@ -8706,6 +8706,7 @@ CreateCheckPoint(int flags)
bool shutdown;
CheckPoint checkPoint;
XLogRecPtr recptr;
+ XLogSegNo _logSegNo;
XLogCtlInsert *Insert = &XLogCtl->Insert;
uint32 freespace;
XLogRecPtr PriorRedoPtr;
@@ -9072,22 +9073,18 @@ CreateCheckPoint(int flags)
*/
smgrpostckpt();
- /*
- * Delete old log files and recycle them
- */
+ /* Update the average distance between checkpoints/restartpoints. */
if (PriorRedoPtr != InvalidXLogRecPtr)
- {
- XLogSegNo _logSegNo;
-
- /* Update the average distance between checkpoints. */
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
- /* Trim from the last checkpoint, not the last - 1 */
- XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- KeepLogSeg(recptr, &_logSegNo);
- _logSegNo--;
- RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, recptr);
- }
+ /*
+ * Delete old log files (those no longer needed for last checkpoint to
+ * prevent the disk holding the xlog from growing full.
+ */
+ XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+ KeepLogSeg(recptr, &_logSegNo);
+ _logSegNo--;
+ RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
/*
* Make more log segments if needed. (Do this after recycling old log
@@ -9253,6 +9250,11 @@ CreateRestartPoint(int flags)
XLogRecPtr lastCheckPointEndPtr;
CheckPoint lastCheckPoint;
XLogRecPtr PriorRedoPtr;
+ XLogRecPtr receivePtr;
+ XLogRecPtr replayPtr;
+ TimeLineID replayTLI;
+ XLogRecPtr endptr;
+ XLogSegNo _logSegNo;
TimestampTz xtime;
/*
@@ -9394,69 +9396,58 @@ CreateRestartPoint(int flags)
}
LWLockRelease(ControlFileLock);
- /*
- * Delete old log files (those no longer needed even for previous
- * checkpoint/restartpoint) to prevent the disk holding the xlog from
- * growing full.
- */
+ /* Update the average distance between checkpoints/restartpoints. */
if (PriorRedoPtr != InvalidXLogRecPtr)
- {
- XLogRecPtr receivePtr;
- XLogRecPtr replayPtr;
- TimeLineID replayTLI;
- XLogRecPtr endptr;
- XLogSegNo _logSegNo;
-
- /* Update the average distance between checkpoints/restartpoints. */
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
- XLByteToSeg(PriorRedoPtr, _logSegNo, wal_segment_size);
+ /*
+ * Delete old log files (those no longer needed for last restartpoint to
+ * prevent the disk holding the xlog from growing full.
+ */
+ XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
- /*
- * Get the current end of xlog replayed or received, whichever is
- * later.
- */
- receivePtr = GetWalRcvWriteRecPtr(NULL, NULL);
- replayPtr = GetXLogReplayRecPtr(&replayTLI);
- endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
+ /*
+ * Retreat _logSegNo using the current end of xlog replayed or received,
+ * whichever is later.
+ */
+ receivePtr = GetWalRcvWriteRecPtr(NULL, NULL);
+ replayPtr = GetXLogReplayRecPtr(&replayTLI);
+ endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
+ KeepLogSeg(endptr, &_logSegNo);
+ _logSegNo--;
+
+ /*
+ * Try to recycle segments on a useful timeline. If we've been promoted
+ * since the beginning of this restartpoint, use the new timeline chosen
+ * at end of recovery (RecoveryInProgress() sets ThisTimeLineID in that
+ * case). If we're still in recovery, use the timeline we're currently
+ * replaying.
+ *
+ * There is no guarantee that the WAL segments will be useful on the
+ * current timeline; if recovery proceeds to a new timeline right after
+ * this, the pre-allocated WAL segments on this timeline will not be used,
+ * and will go wasted until recycled on the next restartpoint. We'll live
+ * with that.
+ */
+ if (RecoveryInProgress())
+ ThisTimeLineID = replayTLI;
- KeepLogSeg(endptr, &_logSegNo);
- _logSegNo--;
+ RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);
- /*
- * Try to recycle segments on a useful timeline. If we've been
- * promoted since the beginning of this restartpoint, use the new
- * timeline chosen at end of recovery (RecoveryInProgress() sets
- * ThisTimeLineID in that case). If we're still in recovery, use the
- * timeline we're currently replaying.
- *
- * There is no guarantee that the WAL segments will be useful on the
- * current timeline; if recovery proceeds to a new timeline right
- * after this, the pre-allocated WAL segments on this timeline will
- * not be used, and will go wasted until recycled on the next
- * restartpoint. We'll live with that.
- */
- if (RecoveryInProgress())
- ThisTimeLineID = replayTLI;
+ /*
+ * Make more log segments if needed. (Do this after recycling old log
+ * segments, since that may supply some of the needed files.)
+ */
+ PreallocXlogFiles(endptr);
- RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, endptr);
-
- /*
- * Make more log segments if needed. (Do this after recycling old log
- * segments, since that may supply some of the needed files.)
- */
- PreallocXlogFiles(endptr);
-
- /*
- * ThisTimeLineID is normally not set when we're still in recovery.
- * However, recycling/preallocating segments above needed
- * ThisTimeLineID to determine which timeline to install the segments
- * on. Reset it now, to restore the normal state of affairs for
- * debugging purposes.
- */
- if (RecoveryInProgress())
- ThisTimeLineID = 0;
- }
+ /*
+ * ThisTimeLineID is normally not set when we're still in recovery.
+ * However, recycling/preallocating segments above needed ThisTimeLineID
+ * to determine which timeline to install the segments on. Reset it now,
+ * to restore the normal state of affairs for debugging purposes.
+ */
+ if (RecoveryInProgress())
+ ThisTimeLineID = 0;
/*
* Truncate pg_subtrans if possible. We can throw away all data before
--
2.16.3