Обсуждение: BUG #16817: kill process cause postmaster hang

Поиск
Список
Период
Сортировка

BUG #16817: kill process cause postmaster hang

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16817
Logged by:          Bo Chen
Email address:      bchen90@163.com
PostgreSQL version: 11.8
Operating system:   euleros v2r7 x86_64
Description:

Hi hackers

    Recently we encountered a problem that after killed walwriter, we expect
the database can recover normally, but it not (the postmaster hang in the
stat of  'wait dead end',and the archiver does't exit).
    After analysis this problem, we found it could be a bug for a long time.
for archiver now use 'system' to call the configed archive command. For
'system' the linux programmer's manual describe the following 'During
execution of the command, SIGCHLD will be blocked, and SIGINT and SIGQUIT
will be ignored'. 

    So, when a child chrash, we now just SIGQUIT the archiver just one time,
while the archiver just execute 'system', SIGQUIT  will be ignored, then the
posmaster hang in stat of 'wait dead end'.

    For this porblem, we now added a SIGUSR2 for archiver after SIGQUIT  for
HandleChildCrash. If there any other solution?

   regards,ChenBo


Re: BUG #16817: kill process cause postmaster hang

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
>     Recently we encountered a problem that after killed walwriter, we expect
> the database can recover normally, but it not (the postmaster hang in the
> stat of  'wait dead end', and the archiver does't exit).
>     After analysis this problem, we found it could be a bug for a long time.
> for archiver now use 'system' to call the configed archive command. For
> 'system' the linux programmer's manual describe the following 'During
> execution of the command, SIGCHLD will be blocked, and SIGINT and SIGQUIT
> will be ignored'. 

>     So, when a child chrash, we now just SIGQUIT the archiver just one time,
> while the archiver just execute 'system', SIGQUIT  will be ignored, then the
> posmaster hang in stat of 'wait dead end'.

Not sure I believe this: why wouldn't the SIGKILL-after-5-seconds logic
get us out of that situation?

            regards, tom lane



Re: BUG #16817: kill process cause postmaster hang

От
bchen90
Дата:
Hi, tom

    Thanks for you reply, and can you elaborate "SIGKILL-after-5-seconds
logic"?


regards, chenbo



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html



Re: BUG #16817: kill process cause postmaster hang

От
Andy Fan
Дата:


On Mon, Jan 25, 2021 at 9:01 AM bchen90 <bchen90@163.com> wrote:
Hi, tom

    Thanks for you reply, and can you elaborate "SIGKILL-after-5-seconds
logic"?


regards, chenbo   

82233ce7ea42d6ba519aaec63008aff49da6c7af should be the commit Tom was
talking about.  

commit 82233ce7ea42d6ba519aaec63008aff49da6c7af
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Fri Jun 28 17:20:53 2013 -0400

    Send SIGKILL to children if they don't die quickly in immediate shutdown

    On immediate shutdown, or during a restart-after-crash sequence,
    postmaster used to send SIGQUIT (and then abandon ship if shutdown); but
    this is not a good strategy if backends don't die because of that
    signal.  (This might happen, for example, if a backend gets tangled
    trying to malloc() due to gettext(), as in an example illustrated by
    MauMau.)  This causes problems when later trying to restart the server,
    because some processes are still attached to the shared memory segment.

    Instead of just abandoning such backends to their fates, we now have
    postmaster hang around for a little while longer, send a SIGKILL after
    some reasonable waiting period, and then exit.  This makes immediate
    shutdown more reliable.

    There is disagreement on whether it's best for postmaster to exit after
    sending SIGKILL, or to stick around until all children have reported
    death.  If this controversy is resolved differently than what this patch
    implements, it's an easy change to make.

    Bug reported by MauMau in message 20DAEA8949EC4E2289C6E8E58560DEC0@maumau

    MauMau and Álvaro Herrera

--
Best Regards

Re: BUG #16817: kill process cause postmaster hang

От
Michael Paquier
Дата:
On Sun, Jan 24, 2021 at 06:01:04PM -0700, bchen90 wrote:
>     Thanks for you reply, and can you elaborate "SIGKILL-after-5-seconds
> logic"?

You are looking for the changes related to this command, as of
postmaster.c:
git grep SIGKILL_CHILDREN_AFTER_SECS
--
Michael

Вложения