Re: Add Information during standby recovery conflicts

Поиск
Список
Период
Сортировка
От Drouvot, Bertrand
Тема Re: Add Information during standby recovery conflicts
Дата
Msg-id 5517bbfd-27ad-3e97-51e2-cb9ffb8f8a8d@amazon.com
обсуждение исходный текст
Ответ на Re: Add Information during standby recovery conflicts  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Ответы Re: Add Information during standby recovery conflicts  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Список pgsql-hackers
Hi,

On 1/6/21 6:31 PM, Fujii Masao wrote:
> CAUTION: This email originated from outside of the organization. Do 
> not click links or open attachments unless you can confirm the sender 
> and know the content is safe.
>
>
>
> On 2020/12/15 0:20, Fujii Masao wrote:
>>
>>
>> On 2020/12/14 21:31, Fujii Masao wrote:
>>>
>>>
>>> On 2020/12/05 12:38, Masahiko Sawada wrote:
>>>> On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand 
>>>> <bdrouvot@amazon.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 12/4/20 2:21 AM, Fujii Masao wrote:
>>>>>>
>>>>>> On 2020/12/04 9:28, Masahiko Sawada wrote:
>>>>>>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
>>>>>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote:
>>>>>>>>>> CAUTION: This email originated from outside of the organization.
>>>>>>>>>> Do not click links or open attachments unless you can confirm 
>>>>>>>>>> the
>>>>>>>>>> sender and know the content is safe.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
>>>>>>>>>> <alvherre@alvh.no-ip.org> wrote:
>>>>>>>>>>> On 2020-Dec-01, Fujii Masao wrote:
>>>>>>>>>>>
>>>>>>>>>>>> + if (proc)
>>>>>>>>>>>> +                     {
>>>>>>>>>>>> +                             if (nprocs == 0)
>>>>>>>>>>>> + appendStringInfo(&buf, "%d", proc->pid);
>>>>>>>>>>>> +                             else
>>>>>>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid);
>>>>>>>>>>>> +
>>>>>>>>>>>> +                             nprocs++;
>>>>>>>>>>>>
>>>>>>>>>>>> What happens if all the backends in wait_list have gone? In
>>>>>>>>>>>> other words,
>>>>>>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs
>>>>>>>>>>>> has not been
>>>>>>>>>>>> incrmented at all)? This would very rarely happen, but can 
>>>>>>>>>>>> happen.
>>>>>>>>>>>> In this case, since buf.data is empty, at least there seems no
>>>>>>>>>>>> need to log
>>>>>>>>>>>> the list of conflicting processes in detail message.
>>>>>>>>>>> Yes, I noticed this too; this can be simplified by changing the
>>>>>>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than
>>>>>>>>>>> wait_list being null), otherwise not print the errdetail.  (You
>>>>>>>>>>> could
>>>>>>>>>>> test buf.data or buf.len instead, but that seems uglier to me.)
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Maybe we can also improve the comment of this function from:
>>>>>>>>>>
>>>>>>>>>> + * This function also reports the details about the conflicting
>>>>>>>>>> + * process ids if *wait_list is not NULL.
>>>>>>>>>>
>>>>>>>>>> to " This function also reports the details about the 
>>>>>>>>>> conflicting
>>>>>>>>>> process ids if exist" or something.
>>>>>>>>>>
>>>>>>>>> Thank you all for the review/remarks.
>>>>>>>>>
>>>>>>>>> They have been addressed in the new attached patch version.
>>>>>>>>
>>>>>>>> Thanks for updating the patch! I read through the patch again
>>>>>>>> and applied the following chages to it. Attached is the updated
>>>>>>>> version of the patch. Could you review this version? If there is
>>>>>>>> no issue in it, I'm thinking to commit this version.
>>>>>>>
>>>>>>> Thank you for updating the patch! I have one question.
>>>>>>>
>>>>>>>>
>>>>>>>> +                       timeouts[cnt].id = STANDBY_TIMEOUT;
>>>>>>>> +                       timeouts[cnt].type = TMPARAM_AFTER;
>>>>>>>> +                       timeouts[cnt].delay_ms = DeadlockTimeout;
>>>>>>>>
>>>>>>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
>>>>>>>> I changed the code that way.
>>>>>>>
>>>>>>> As the comment of ResolveRecoveryConflictWithLock() says the
>>>>>>> following, a deadlock is detected by the ordinary backend process:
>>>>>>>
>>>>>>>    * Deadlocks involving the Startup process and an ordinary 
>>>>>>> backend
>>>>>>> proces
>>>>>>>    * will be detected by the deadlock detector within the ordinary
>>>>>>> backend.
>>>>>>>
>>>>>>> If we use STANDBY_DEADLOCK_TIMEOUT,
>>>>>>> SendRecoveryConflictWithBufferPin() will be called after
>>>>>>> DeadlockTimeout passed, but I think it's not necessary for the 
>>>>>>> startup
>>>>>>> process in this case.
>>>>>>
>>>>>> Thanks for pointing this! You are right.
>>>>>>
>>>>>>
>>>>>>> If we want to just wake up the startup process
>>>>>>> maybe we can use STANDBY_TIMEOUT here?
>>>>>>
>>>>> Thanks for the patch updates! Except what we are still discussing 
>>>>> below,
>>>>> it looks good to me.
>>>>>
>>>>>> When STANDBY_TIMEOUT happens, a request to release conflicting 
>>>>>> buffer
>>>>>> pins is sent. Right? If so, we should not also use 
>>>>>> STANDBY_TIMEOUT there?
>>>>>
>>>>> Agree
>>>>>
>>>>>>
>>>>>> Or, first of all, we don't need to enable the deadlock timer at all?
>>>>>> Since what we'd like to do is to wake up after deadlock_timeout
>>>>>> passes, we can do that by changing ProcWaitForSignal() so that it 
>>>>>> can
>>>>>> accept the timeout and giving the deadlock_timeout to it. If we do
>>>>>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
>>>>>> ResolveRecoveryConflictWithLock(). Thought?
>>>>
>>>> Where do we enable deadlock timeout in hot standby case? You meant to
>>>> enable it in ProcWaitForSignal() or where we set a timer for not hot
>>>> standby case, in ProcSleep()?
>>>
>>> No, what I tried to say is to change 
>>> ResolveRecoveryConflictWithLock() so that it does
>>>
>>> 1. calculate the "minimum" timeout from deadlock_timeout and 
>>> max_standby_xxx_delay
>>> 2. give the calculated timeout value to ProcWaitForSignal()
>>> 3. wait for signal and timeout on ProcWaitForSignal()
>>>
>>> Since ProcWaitForSignal() calls WaitLatch(), seems it's not so 
>>> difficult to make ProcWaitForSignal() handle the timeout. If we do 
>>> this, I was thinking that we can get rid of enable_timeouts() from 
>>> ResolveRecoveryConflictWithLock().
>>>
>>>
>>>>
>>>>>
>>>>> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it 
>>>>> triggers
>>>>> a call to StandbyLockTimeoutHandler() which does nothing, except 
>>>>> waking
>>>>> up. That's what we want, right?)
>>>>
>>>> Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup
>>>> process can wake up and do nothing. Thank you for pointing out.
>>>
>>> Okay, understood! Firstly I was thinking that enabling the same type 
>>> (i.e., STANDBY_LOCK_TIMEOUT) of lock twice doesn't work properly, 
>>> but as far as I read the code, it works. In that case, only the 
>>> shorter timeout would be activated in enable_timeouts(). So I agree 
>>> to use STANDBY_LOCK_TIMEOUT.
>>
>> So I renamed the argument "deadlock_timer" in 
>> ResolveRecoveryConflictWithLock()
>> because it's not the timer for deadlock and is confusing. Attached is 
>> the
>> updated version of the patch. Barring any objection, I will commit 
>> this version.
>
> Since the recent commit 8900b5a9d5 changed the recovery conflict code,
> I updated the patch. Attached is the updated version of the patch.
>
Thanks for those updates!

I had a look and the patch does look good to me.

As far the other threads regarding:

- "maybe_log_conflict" and "maybe_update_title" naming: I don’t have 
strong opinions about it but I am more inclined to stay with the “maybe” 
naming (as it is currently in this patch version) as it better reflects 
that this may or not occur.
- the errdetail log message format in LogRecoveryConflict() (currently 
looks like “Conflicting process: 25118.”) : I don’t have strong opinions 
about it but I am more inclined to stay as it is, as it looks similar as 
the format being used in ProcSleep() (even if we can find different 
formats in other places though).

Bertrand




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josef Šimánek
Дата:
Сообщение: Re: [PATCH] Simple progress reporting for COPY command
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: pgbench stopped supporting large number of client connections on Windows