Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)
Дата
Msg-id 977682d0-a23e-9529-ccc6-8872769c0f6e@enterprisedb.com
обсуждение исходный текст
Ответ на Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers

On 8/11/23 21:51, Thomas Munro wrote:
> On Sat, Aug 12, 2023 at 5:51 AM Andres Freund <andres@anarazel.de> wrote:
>> On 2023-08-11 15:31:43 +0200, Tomas Vondra wrote:
>>> It seems to me the issue is in WalSndWait, which was reworked to use
>>> ConditionVariableCancelSleep() in bc971f4025c. The walsenders end up
>>> waking each other in a busy loop, until the timing changes just enough
>>> to break the cycle.
>>
>> IMO ConditionVariableCancelSleep()'s behaviour of waking up additional
>> processes can nearly be considered a bug, at least when combined with
>> ConditionVariableBroadcast(). In that case the wakeup is never needed, and it
>> can cause situations like this, where condition variables basically
>> deteriorate to a busy loop.
>>
>> I hit this with AIO as well. I've been "solving" it by adding a
>> ConditionVariableCancelSleepEx(), which has a only_broadcasts argument.
>>
>> I'm inclined to think that any code that needs that needs the forwarding
>> behaviour is pretty much buggy.
> 
> Oh, I see what's happening.  Maybe commit b91dd9de wasn't the best
> idea, but bc971f4025c broke an assumption, since it doesn't use
> ConditionVariableSleep().  That is confusing the signal forwarding
> logic which expects to find our entry in the wait list in the common
> case.
> 
> What do you think about this patch?

I'm not familiar with the condition variable code enough to have an
opinion, but the patch seems to resolve the issue for me - I can no
longer reproduce the high CPU usage.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Extract numeric filed in JSONB more effectively
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: jsonb_populate_array