Re: Problem with synchronous replication
От | Kyotaro Horiguchi |
---|---|
Тема | Re: Problem with synchronous replication |
Дата | |
Msg-id | 20191030.123428.18823202335157111.horikyota.ntt@gmail.com обсуждение исходный текст |
Ответ на | Re: Problem with synchronous replication (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Problem with synchronous replication
(Michael Paquier <michael@paquier.xyz>)
|
Список | pgsql-hackers |
At Wed, 30 Oct 2019 10:45:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote: > > At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in > >> I recently discovered two possible bugs about synchronous replication. > >> > >> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted > >> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, > >> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, > >> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. > >> > >> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check > >> whether the queue is detached or not. > > > > I think you're right here. > > Indeed. Looking at the surroundings we expect some code paths to hold > SyncRepLock in exclusive or shared mode but we don't actually check > that the lock is hold. So let's add some assertions while on it. I looked around closer. If we do that strictly, other functions like SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static functions don't need Assert() and caution in their comments would be enough. On the other hand, the similar-looking code in SyncRepInitConfig and SyncRepUpdateSyncStandbysDefined seems safe since AFAICS it doesn't have (this kind of) racing condition on wirtes. It might need a comment like that. Or, we could go to (apparently) safer-side by applying the same amendment to it. SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without holding SyncRepLock, which could lead to a message with wrong priority. I'm not sure it matters, though. > > This is not right. It is in transaction commit so it is in a > > HOLD_INTERRUPTS section. ProcessInterrupt does not respond to > > cancel/die interrupts thus the ereport should return. > > Yeah. There is an easy way to check after that: InterruptHoldoffCount > needs to be strictly positive. > > My suggestions are attached. Any thoughts? Seems reasonable for holdoffs. The same assertion would be needed in more places but it's another issue. By the way while I was looking this, I found a typo. Please find the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index a21f7d3347..16aee1de4c 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -1065,8 +1065,8 @@ SyncRepUpdateSyncStandbysDefined(void) /* * If synchronous_standby_names has been reset to empty, it's futile - * for backends to continue to waiting. Since the user no longer - * wants synchronous replication, we'd better wake them up. + * for backends to continue waiting. Since the user no longer wants + * synchronous replication, we'd better wake them up. */ if (!sync_standbys_defined) {
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Re: [PATCH] Do not use StdRdOptions in Access Methods