Re: Add Information during standby recovery conflicts
От | Drouvot, Bertrand |
---|---|
Тема | Re: Add Information during standby recovery conflicts |
Дата | |
Msg-id | 5517bbfd-27ad-3e97-51e2-cb9ffb8f8a8d@amazon.com обсуждение исходный текст |
Ответ на | Re: Add Information during standby recovery conflicts (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Ответы |
Re: Add Information during standby recovery conflicts
(Fujii Masao <masao.fujii@oss.nttdata.com>)
|
Список | pgsql-hackers |
Hi, On 1/6/21 6:31 PM, Fujii Masao wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > On 2020/12/15 0:20, Fujii Masao wrote: >> >> >> On 2020/12/14 21:31, Fujii Masao wrote: >>> >>> >>> On 2020/12/05 12:38, Masahiko Sawada wrote: >>>> On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand >>>> <bdrouvot@amazon.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> On 12/4/20 2:21 AM, Fujii Masao wrote: >>>>>> >>>>>> On 2020/12/04 9:28, Masahiko Sawada wrote: >>>>>>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao >>>>>>> <masao.fujii@oss.nttdata.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote: >>>>>>>>>> CAUTION: This email originated from outside of the organization. >>>>>>>>>> Do not click links or open attachments unless you can confirm >>>>>>>>>> the >>>>>>>>>> sender and know the content is safe. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera >>>>>>>>>> <alvherre@alvh.no-ip.org> wrote: >>>>>>>>>>> On 2020-Dec-01, Fujii Masao wrote: >>>>>>>>>>> >>>>>>>>>>>> + if (proc) >>>>>>>>>>>> + { >>>>>>>>>>>> + if (nprocs == 0) >>>>>>>>>>>> + appendStringInfo(&buf, "%d", proc->pid); >>>>>>>>>>>> + else >>>>>>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid); >>>>>>>>>>>> + >>>>>>>>>>>> + nprocs++; >>>>>>>>>>>> >>>>>>>>>>>> What happens if all the backends in wait_list have gone? In >>>>>>>>>>>> other words, >>>>>>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs >>>>>>>>>>>> has not been >>>>>>>>>>>> incrmented at all)? This would very rarely happen, but can >>>>>>>>>>>> happen. >>>>>>>>>>>> In this case, since buf.data is empty, at least there seems no >>>>>>>>>>>> need to log >>>>>>>>>>>> the list of conflicting processes in detail message. >>>>>>>>>>> Yes, I noticed this too; this can be simplified by changing the >>>>>>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than >>>>>>>>>>> wait_list being null), otherwise not print the errdetail. (You >>>>>>>>>>> could >>>>>>>>>>> test buf.data or buf.len instead, but that seems uglier to me.) >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> Maybe we can also improve the comment of this function from: >>>>>>>>>> >>>>>>>>>> + * This function also reports the details about the conflicting >>>>>>>>>> + * process ids if *wait_list is not NULL. >>>>>>>>>> >>>>>>>>>> to " This function also reports the details about the >>>>>>>>>> conflicting >>>>>>>>>> process ids if exist" or something. >>>>>>>>>> >>>>>>>>> Thank you all for the review/remarks. >>>>>>>>> >>>>>>>>> They have been addressed in the new attached patch version. >>>>>>>> >>>>>>>> Thanks for updating the patch! I read through the patch again >>>>>>>> and applied the following chages to it. Attached is the updated >>>>>>>> version of the patch. Could you review this version? If there is >>>>>>>> no issue in it, I'm thinking to commit this version. >>>>>>> >>>>>>> Thank you for updating the patch! I have one question. >>>>>>> >>>>>>>> >>>>>>>> + timeouts[cnt].id = STANDBY_TIMEOUT; >>>>>>>> + timeouts[cnt].type = TMPARAM_AFTER; >>>>>>>> + timeouts[cnt].delay_ms = DeadlockTimeout; >>>>>>>> >>>>>>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here? >>>>>>>> I changed the code that way. >>>>>>> >>>>>>> As the comment of ResolveRecoveryConflictWithLock() says the >>>>>>> following, a deadlock is detected by the ordinary backend process: >>>>>>> >>>>>>> * Deadlocks involving the Startup process and an ordinary >>>>>>> backend >>>>>>> proces >>>>>>> * will be detected by the deadlock detector within the ordinary >>>>>>> backend. >>>>>>> >>>>>>> If we use STANDBY_DEADLOCK_TIMEOUT, >>>>>>> SendRecoveryConflictWithBufferPin() will be called after >>>>>>> DeadlockTimeout passed, but I think it's not necessary for the >>>>>>> startup >>>>>>> process in this case. >>>>>> >>>>>> Thanks for pointing this! You are right. >>>>>> >>>>>> >>>>>>> If we want to just wake up the startup process >>>>>>> maybe we can use STANDBY_TIMEOUT here? >>>>>> >>>>> Thanks for the patch updates! Except what we are still discussing >>>>> below, >>>>> it looks good to me. >>>>> >>>>>> When STANDBY_TIMEOUT happens, a request to release conflicting >>>>>> buffer >>>>>> pins is sent. Right? If so, we should not also use >>>>>> STANDBY_TIMEOUT there? >>>>> >>>>> Agree >>>>> >>>>>> >>>>>> Or, first of all, we don't need to enable the deadlock timer at all? >>>>>> Since what we'd like to do is to wake up after deadlock_timeout >>>>>> passes, we can do that by changing ProcWaitForSignal() so that it >>>>>> can >>>>>> accept the timeout and giving the deadlock_timeout to it. If we do >>>>>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from >>>>>> ResolveRecoveryConflictWithLock(). Thought? >>>> >>>> Where do we enable deadlock timeout in hot standby case? You meant to >>>> enable it in ProcWaitForSignal() or where we set a timer for not hot >>>> standby case, in ProcSleep()? >>> >>> No, what I tried to say is to change >>> ResolveRecoveryConflictWithLock() so that it does >>> >>> 1. calculate the "minimum" timeout from deadlock_timeout and >>> max_standby_xxx_delay >>> 2. give the calculated timeout value to ProcWaitForSignal() >>> 3. wait for signal and timeout on ProcWaitForSignal() >>> >>> Since ProcWaitForSignal() calls WaitLatch(), seems it's not so >>> difficult to make ProcWaitForSignal() handle the timeout. If we do >>> this, I was thinking that we can get rid of enable_timeouts() from >>> ResolveRecoveryConflictWithLock(). >>> >>> >>>> >>>>> >>>>> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it >>>>> triggers >>>>> a call to StandbyLockTimeoutHandler() which does nothing, except >>>>> waking >>>>> up. That's what we want, right?) >>>> >>>> Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup >>>> process can wake up and do nothing. Thank you for pointing out. >>> >>> Okay, understood! Firstly I was thinking that enabling the same type >>> (i.e., STANDBY_LOCK_TIMEOUT) of lock twice doesn't work properly, >>> but as far as I read the code, it works. In that case, only the >>> shorter timeout would be activated in enable_timeouts(). So I agree >>> to use STANDBY_LOCK_TIMEOUT. >> >> So I renamed the argument "deadlock_timer" in >> ResolveRecoveryConflictWithLock() >> because it's not the timer for deadlock and is confusing. Attached is >> the >> updated version of the patch. Barring any objection, I will commit >> this version. > > Since the recent commit 8900b5a9d5 changed the recovery conflict code, > I updated the patch. Attached is the updated version of the patch. > Thanks for those updates! I had a look and the patch does look good to me. As far the other threads regarding: - "maybe_log_conflict" and "maybe_update_title" naming: I don’t have strong opinions about it but I am more inclined to stay with the “maybe” naming (as it is currently in this patch version) as it better reflects that this may or not occur. - the errdetail log message format in LogRecoveryConflict() (currently looks like “Conflicting process: 25118.”) : I don’t have strong opinions about it but I am more inclined to stay as it is, as it looks similar as the format being used in ProcSleep() (even if we can find different formats in other places though). Bertrand
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Josef ŠimánekДата:
Сообщение: Re: [PATCH] Simple progress reporting for COPY command
Следующее
От: Masahiko SawadaДата:
Сообщение: Re: pgbench stopped supporting large number of client connections on Windows