Обсуждение: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)

Поиск
Список
Период
Сортировка

all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)

От
Mark Aufflick
Дата:
DEBUG:  server process (pid 971) was terminated by signal 14
DEBUG:  terminating any other active server processes

seems to be happening every day or so.

The server log doesn't indicate any problems more innocuous than the
occasional unexpected EOF on client connections, a few 'adding missing
FROM-clause' and a stack of name truncation log entries.

The clients are AOLServer/OpenACS and a perl daemon that forks off a
handful of children (which only access two tables).

Both plpgsql and plperlu are used (plperlu is used for one trigger
function to post a single https form that sends an sms message, and
record the result body).

I have trolled the mail list archives and the only similar deaths I
have found are under cygwin, but I am running a fully up2date redhat
7.2 box (with custom compiled 7.2.3 from sources as of a month ago).

Any ideas would be greatly appreciated!

Cheers,

Mark.
--
Mark Aufflick
e: mark@pumptheory.com
w: www.pumptheory.com
p: +61 438 700 647


Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)

От
Tom Lane
Дата:
Mark Aufflick <mark@pumptheory.com> writes:
> DEBUG:  server process (pid 971) was terminated by signal 14

Hm, that's SIGALRM on my box, I assume so on yours too.

AFAICT, there is no part of the Postgres code that runs with SIGALRM
set to default handling: it's either SIG_IGN or the deadlock timer
handler.

> Both plpgsql and plperlu are used (plperlu is used for one trigger
> function to post a single https form that sends an sms message, and
> record the result body).

I wonder whether the Perl interpreter is hacking on the SIGALRM
setting.  That would be pretty unfriendly of it (but I don't think
Perl quite believes the notion that it might be only a subroutine
library, and not in full control of the process...)

            regards, tom lane

Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)

От
Mark Aufflick
Дата:
ok, so that's not it - i'm definitely not trapping SIGALRM (and btw,
this was only in the perl client code, which I don;t see how that could
cause the problem anyway - as opposed to in the plperlu function, which
in any case I am pretty sure was not being called when  the server
crashed)

the log entries are:

DEBUG:  server process (pid 20704) was terminated by signal 14
DEBUG:  terminating any other active server processes
NOTICE:  Message from PostgreSQL backend:
         The Postmaster has informed me that some other backend
         died abnormally and possibly corrupted shared memory.
         I have rolled back the current transaction and am
         going to terminate your database system connection and exit.
         Please reconnect to the database system and repeat your query.
[repeated]
FATAL 1:  The database system is in recovery mode
[repeated]
DEBUG:  all server processes terminated; reinitializing shared memory
and semaphores
DEBUG:  database system was interrupted at 2003-01-29 02:23:14 EST
DEBUG:  checkpoint record is at 0/1267C284
DEBUG:  redo record is at 0/1267C284; undo record is at 0/0; shutdown
FALSE
DEBUG:  next transaction id: 823075; next oid: 134017
DEBUG:  database system was not properly shut down; automatic recovery
in progress
FATAL 1:  The database system is starting up
[repeated]
DEBUG:  redo starts at 0/1267C2C4
DEBUG:  ReadRecord: record with zero length at 0/126B3D80
DEBUG:  redo done at 0/126B3D5C
FATAL 1:  The database system is starting up
[repeated]
DEBUG:  database system is ready

any ideas anyone?

Mark.




with the last NOTICE being repeated for each backend.
On Tuesday, January 28, 2003, at 03:42 PM, Tom Lane wrote:

> Mark Aufflick <mark@pumptheory.com> writes:
>> DEBUG:  server process (pid 971) was terminated by signal 14
>
> Hm, that's SIGALRM on my box, I assume so on yours too.
>
> AFAICT, there is no part of the Postgres code that runs with SIGALRM
> set to default handling: it's either SIG_IGN or the deadlock timer
> handler.
>
>> Both plpgsql and plperlu are used (plperlu is used for one trigger
>> function to post a single https form that sends an sms message, and
>> record the result body).
>
> I wonder whether the Perl interpreter is hacking on the SIGALRM
> setting.  That would be pretty unfriendly of it (but I don't think
> Perl quite believes the notion that it might be only a subroutine
> library, and not in full control of the process...)
>
>             regards, tom lane


Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)

От
Tom Lane
Дата:
Mark Aufflick <mark@pumptheory.com> writes:
> ok, so that's not it - i'm definitely not trapping SIGALRM (and btw,
> this was only in the perl client code, which I don;t see how that could
> cause the problem anyway - as opposed to in the plperlu function, which
> in any case I am pretty sure was not being called when  the server
> crashed)

It wouldn't have to be executing when the crash occurred.  If it had
executed at some prior time, and reset the handling of signal 14 at that
time, then you'd get this failure:

> DEBUG:  server process (pid 20704) was terminated by signal 14

whenever the backend process would next have reached a lock timeout.

I have not dug through the Perl sources to look for mucking with
SIGALRM, but I bet that's what the problem is.

            regards, tom lane

Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)

От
Mark Aufflick
Дата:
now that you made me stop and think, I am guessing that the Net::HTTP
module must use SIGALRM for handling timeouts...

failing finding a way to do without the plperlu trigger altogether, i
guess i will have to save and restore the trap - could be messy.


On Wednesday, January 29, 2003, at 02:41 AM, Tom Lane wrote:

> Mark Aufflick <mark@pumptheory.com> writes:
>> ok, so that's not it - i'm definitely not trapping SIGALRM (and btw,
>> this was only in the perl client code, which I don;t see how that
>> could
>> cause the problem anyway - as opposed to in the plperlu function,
>> which
>> in any case I am pretty sure was not being called when  the server
>> crashed)
>
> It wouldn't have to be executing when the crash occurred.  If it had
> executed at some prior time, and reset the handling of signal 14 at
> that
> time, then you'd get this failure:
>
>> DEBUG:  server process (pid 20704) was terminated by signal 14
>
> whenever the backend process would next have reached a lock timeout.
>
> I have not dug through the Perl sources to look for mucking with
> SIGALRM, but I bet that's what the problem is.
>
>             regards, tom lane


Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)

От
Vivek Khera
Дата:
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

>> DEBUG:  server process (pid 20704) was terminated by signal 14

TL> whenever the backend process would next have reached a lock timeout.

TL> I have not dug through the Perl sources to look for mucking with
TL> SIGALRM, but I bet that's what the problem is.


From what I recall, perl takes charge of all signals in order to
deliver them at safe points to your perl program.  This is both good
and bad.  mod_perl has to deal with perl taking signals from apache.
Perhaps that code could be worth a read.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)

От
Mark Aufflick
Дата:
Ahhh,

yes, um, (looks to see if anyone noticed) that would be the:

use sigtrap qw(die untrapped normal-signals stack-trace any
error-signals);

line in my code...

i will get rid of the 'untrapped normal-signals' and report back.

ta.

On Tuesday, January 28, 2003, at 03:42 PM, Tom Lane wrote:

> Mark Aufflick <mark@pumptheory.com> writes:
>> DEBUG:  server process (pid 971) was terminated by signal 14
>
> Hm, that's SIGALRM on my box, I assume so on yours too.
>
> AFAICT, there is no part of the Postgres code that runs with SIGALRM
> set to default handling: it's either SIG_IGN or the deadlock timer
> handler.
>
>> Both plpgsql and plperlu are used (plperlu is used for one trigger
>> function to post a single https form that sends an sms message, and
>> record the result body).
>
> I wonder whether the Perl interpreter is hacking on the SIGALRM
> setting.  That would be pretty unfriendly of it (but I don't think
> Perl quite believes the notion that it might be only a subroutine
> library, and not in full control of the process...)
>
>             regards, tom lane