Re: Problem with synchronous replication

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Problem with synchronous replication
Дата
Msg-id 20191030.123428.18823202335157111.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Problem with synchronous replication  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Problem with synchronous replication  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
At Wed, 30 Oct 2019 10:45:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:
> > At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in 
> >> I recently discovered two possible bugs about synchronous replication.
> >> 
> >> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
> >> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, 
> >> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, 
> >> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. 
> >> 
> >> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
> >> whether the queue is detached or not.
> > 
> > I think you're right here.
> 
> Indeed.  Looking at the surroundings we expect some code paths to hold
> SyncRepLock in exclusive or shared mode but we don't actually check
> that the lock is hold.  So let's add some assertions while on it.

I looked around closer.

If we do that strictly, other functions like
SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static
functions don't need Assert() and caution in their comments would be
enough.

On the other hand, the similar-looking code in SyncRepInitConfig and
SyncRepUpdateSyncStandbysDefined seems safe since AFAICS it doesn't
have (this kind of) racing condition on wirtes. It might need a
comment like that. Or, we could go to (apparently) safer-side by
applying the same amendment to it.

SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without
holding SyncRepLock, which could lead to a message with wrong
priority. I'm not sure it matters, though.

> > This is not right. It is in transaction commit so it is in a
> > HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
> > cancel/die interrupts thus the ereport should return.
> 
> Yeah.  There is an easy way to check after that: InterruptHoldoffCount
> needs to be strictly positive.
> 
> My suggestions are attached.  Any thoughts?

Seems reasonable for holdoffs. The same assertion would be needed in
more places but it's another issue.


By the way while I was looking this, I found a typo. Please find the
attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index a21f7d3347..16aee1de4c 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -1065,8 +1065,8 @@ SyncRepUpdateSyncStandbysDefined(void)
 
         /*
          * If synchronous_standby_names has been reset to empty, it's futile
-         * for backends to continue to waiting.  Since the user no longer
-         * wants synchronous replication, we'd better wake them up.
+         * for backends to continue waiting.  Since the user no longer wants
+         * synchronous replication, we'd better wake them up.
          */
         if (!sync_standbys_defined)
         {

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [PATCH] Do not use StdRdOptions in Access Methods
Следующее
От: vignesh C
Дата:
Сообщение: Re: Typos and inconsistencies in code