Re: Crash by targetted recovery

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Crash by targetted recovery
Дата
Msg-id 20200227.170530.2264412619413068664.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Crash by targetted recovery  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Ответы Re: Crash by targetted recovery  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Список pgsql-hackers
Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> >> I failed to understand why random access while reading from
> >> stream is bad idea. Could you elaborate why?
> > It seems to me the word "streaming" suggests that WAL record should be
> > read sequentially. Random access, which means reading from arbitrary
> > location, breaks a stream.  (But the patch doesn't try to stop wal
> > sender if randAccess.)
> > 
> >> Isn't it sufficient to set currentSource to 0 when disabling
> >> StandbyMode?
> > I thought that and it should work, but I hesitated to manipulate on
> > currentSource in StartupXLOG. currentSource is basically a private
> > state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> > think it's not good to modify it out of the the logic in
> > WaitForWALToBecomeAvailable.
> 
> If so, what about adding the following at the top of
> WaitForWALToBecomeAvailable()?
> 
>     if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
>          currentSource = 0;

It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.

> > Come to think of that I got to think the
> > following part in ReadRecord should use randAccess instead..
> > xlog.c:4384
> >>      /*
> > -      * Before we retry, reset lastSourceFailed and currentSource
> > -      * so that we will check the archive next.
> > +      * Streaming has broken, we retry from the same LSN.
> >>       */
> >>      lastSourceFailed = false;
> > -     currentSource = 0;
> > +     private->randAccess = true;
> 
> Sorry, I failed to understand why this change is necessary...

It's not necessary, just for being tidy about the responsibility on
currentSource.

> At least the comment that you added seems incorrect
> because WAL streaming should not have started yet when
> we reach the above point.

Oops, right.

-      * Streaming has broken, we retry from the same LSN.
+      * Restart recovery from the current LSN.

For clarity, I don't insist on the change at all. If it were
necessary, it's another topic, anyway. Please forget it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 8cb817a43226a0d60dd62c6205219a2f0c807b9e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v2 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 32 +++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..8e71788981 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
     'recovery end before target reached is a fatal error');
+
+############
+# Edge case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+                                has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
-- 
2.18.2

From d917638b787b9d1393cbc0d77586168da844756b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 26 Feb 2020 20:41:37 +0900
Subject: [PATCH v2 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..f82a582c53 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11831,7 +11831,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
      *      pg_wal (XLOG_FROM_PG_WAL)
      * 2. Check trigger file
-     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+     * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+     *    If StandbyMode is false, the state machine goes back to 1.
      * 4. Rescan timelines
      * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.
      *
@@ -11846,8 +11847,16 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
      */
     if (!InArchiveRecovery)
         currentSource = XLOG_FROM_PG_WAL;
-    else if (currentSource == 0)
+    else if (currentSource == 0 ||
+             (!StandbyMode && currentSource == XLOG_FROM_STREAM))
+    {
+        /*
+         * Shoule be before starting streaming, or should have exited from
+         * streaming.
+         */
+        Assert(!WalRcvStreaming());
         currentSource = XLOG_FROM_ARCHIVE;
+    }
 
     for (;;)
     {
-- 
2.18.2


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: Collation versioning
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: reindex concurrently and two toast indexes