Re: the s_lock_stuck on perform_spin_delay
От | Andy Fan |
---|---|
Тема | Re: the s_lock_stuck on perform_spin_delay |
Дата | |
Msg-id | 87h6jkmucz.fsf@163.com обсуждение исходный текст |
Ответ на | Re: the s_lock_stuck on perform_spin_delay (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Ответы |
Re: the s_lock_stuck on perform_spin_delay
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
Hi Matthias, Thanks for the review! Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > On Wed, 10 Jan 2024 at 02:44, Andy Fan <zhihuifan1213@163.com> wrote: >> Hi, >> >> I want to know if Andres or you have plan >> to do some code review. I don't expect this would happen very soon, just >> want to make sure this will not happen that both of you think the other >> one will do, but actually none of them does it in fact. a commit fest >> [1] has been added for this. > > >> +++ b/src/backend/storage/buffer/bufmgr.c >> @@ -5419,6 +5419,7 @@ LockBufHdr(BufferDesc *desc) >> perform_spin_delay(&delayStatus); >> } >> finish_spin_delay(&delayStatus); >> + START_SPIN_LOCK(); >> return old_buf_state | BM_LOCKED; >> } > > I think that we need to 'arm' the checks just before we lock the spin > lock, and 'disarm' the checks just after we unlock the spin lock, > rather than after and before, respectively. That way, we won't have a > chance of false negatives: with your current patch it is possible that > an interrupt fires between the acquisition of the lock and the code in > START_SPIN_LOCK() marking the thread as holding a spin lock, which > would cause any check in that signal handler to incorrectly read that > we don't hold any spin locks. That's a good idea. fixed in v2. > >> +++ b/src/backend/storage/lmgr/lock.c >> @@ -776,6 +776,8 @@ LockAcquireExtended(const LOCKTAG *locktag, >> bool found_conflict; >> bool log_lock = false; >> >> + Assert(SpinLockCount == 0); >> + > > I'm not 100% sure on the policy of this, but theoretically you could > use LockAquireExtended(dontWait=true) while holding a spin lock, as > that would not have an unknown duration. Then again, this function > also does elog/ereport, which would cause issues, still, so this code > may be the better option. I thought this statement as "keeping the current patch as it is" since "not waiting" doesn't means the a few dozen in this case. please correct me if anything wrong. > >> + elog(PANIC, "stuck spinlock detected at %s, %s:%d after waiting for %u ms", >> + func, file, line, delay_ms); > > pg_usleep doesn't actually guarantee that we'll wait for exactly that > duration; depending on signals received while spinning and/or OS > scheduling decisions it may be off by orders of magnitude. True, but I did this for two reasons. a). the other soluation needs call 'time' syscall twice, I didn't want to pay this run-time effort. b). the possiblity of geting a signals during pg_usleep should be low and even that happens, because the message is just for human, we don't need a absolutely accurate number. what do you think? > >> +++ b/src/common/scram-common.c > > This is unrelated to the main patchset Fixed in v2. > >> +++ b/src/include/storage/spin.h > > Minor: I think these changes could better be included in miscadmin, or > at least the definition for SpinLockCount should be moved there: The > spin lock system itself shouldn't be needed in places where we need to > make sure that we don't hold any spinlocks, and miscadmin.h already > holds things related to "System interrupt and critical section > handling", which seems quite related. fixed in v2. -- Best Regards Andy Fan
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Sutou KouheiДата:
Сообщение: Re: Make COPY format extendable: Extract COPY TO format implementations