Re: the s_lock_stuck on perform_spin_delay

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: the s_lock_stuck on perform_spin_delay
Дата
Msg-id 87h6jkmucz.fsf@163.com
обсуждение исходный текст
Ответ на Re: the s_lock_stuck on perform_spin_delay  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: the s_lock_stuck on perform_spin_delay  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Matthias,

Thanks for the review!

Matthias van de Meent <boekewurm+postgres@gmail.com> writes:

> On Wed, 10 Jan 2024 at 02:44, Andy Fan <zhihuifan1213@163.com> wrote:
>> Hi,
>>
>> I want to know if Andres or you have plan
>> to do some code review. I don't expect this would happen very soon, just
>> want to make sure this will not happen that both of you think the other
>> one will do, but actually none of them does it in fact. a commit fest
>> [1] has been added for this.
>
>
>> +++ b/src/backend/storage/buffer/bufmgr.c
>> @@ -5419,6 +5419,7 @@ LockBufHdr(BufferDesc *desc)
>>         perform_spin_delay(&delayStatus);
>>     }
>>     finish_spin_delay(&delayStatus);
>> +    START_SPIN_LOCK();
>>     return old_buf_state | BM_LOCKED;
>> }
>
> I think that we need to 'arm' the checks just before we lock the spin
> lock, and 'disarm' the checks just after we unlock the spin lock,
> rather than after and before, respectively. That way, we won't have a
> chance of false negatives: with your current patch it is possible that
> an interrupt fires between the acquisition of the lock and the code in
> START_SPIN_LOCK() marking the thread as holding a spin lock, which
> would cause any check in that signal handler to incorrectly read that
> we don't hold any spin locks.

That's a good idea. fixed in v2.

>
>> +++ b/src/backend/storage/lmgr/lock.c
>> @@ -776,6 +776,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
>>     bool        found_conflict;
>>     bool        log_lock = false;
>>
>> +    Assert(SpinLockCount == 0);
>> +
>
> I'm not 100% sure on the policy of this, but theoretically you could
> use LockAquireExtended(dontWait=true) while holding a spin lock, as
> that would not have an unknown duration. Then again, this function
> also does elog/ereport, which would cause issues, still, so this code
> may be the better option.

I thought this statement as "keeping the current patch as it is" since
"not waiting" doesn't means the a few dozen in this case. please
correct me if anything wrong.

>
>> +    elog(PANIC, "stuck spinlock detected at %s, %s:%d after waiting for %u ms",
>> +         func, file, line, delay_ms);
>
> pg_usleep doesn't actually guarantee that we'll wait for exactly that
> duration; depending on signals received while spinning and/or OS
> scheduling decisions it may be off by orders of magnitude.

True, but I did this for two reasons. a). the other soluation needs call
'time' syscall twice, I didn't want to pay this run-time effort. b). the
possiblity of geting a signals during pg_usleep should be low and
even that happens, because the message is just for human, we don't need
a absolutely accurate number. what do you think?

>
>> +++ b/src/common/scram-common.c
>
> This is unrelated to the main patchset

Fixed in v2.

>
>> +++ b/src/include/storage/spin.h
>
> Minor: I think these changes could better be included in miscadmin, or
> at least the definition for SpinLockCount should be moved there: The
> spin lock system itself shouldn't be needed in places where we need to
> make sure that we don't hold any spinlocks, and miscadmin.h already
> holds things related to "System interrupt and critical section
> handling", which seems quite related.

fixed in v2. 

-- 
Best Regards
Andy Fan


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sutou Kouhei
Дата:
Сообщение: Re: Make COPY format extendable: Extract COPY TO format implementations
Следующее
От: Peter Smith
Дата:
Сообщение: Re: Synchronizing slots from primary to standby