Обсуждение: SIGCHLD handler in Postgres C function.

Поиск
Список
Период
Сортировка

SIGCHLD handler in Postgres C function.

От
spshealy@yahoo.com
Дата:
I was wondering if some of you Postgres hackers could advise me on the
safety of the following.  I have written a postgres C function that
uses a popen linux system call. Orginally when I first tried it I kept
getting an ECHILD.  I read a little bit more on the pclose function
and the wait system calls and discoverd that on LINUX if the signal
handler for  SIGCHLD is set to SIG_IGN you will get the ECHILD error
on pclose(or wait4 for that matter).  So I did some snooping around in
the postgres backend code and found that in the traffic cop that the
SIGCHLD signal handler is set to SIG_IGN.  So in my C function right
before the popen call I set the signal handler for SIGCHLD to SIG_DFL
and right after the pclose I set it back to SIG_IGN.  I tested this
and it seems to solve my problem.  Not knowing much about the
internals of the postgres backend I would like to know...  Is setting
the signal handler to SIG_IGN temorarily going to do anything funky
with by database or the backend?

Thanks in advance for your insights,
Scott Shealy


Re: SIGCHLD handler in Postgres C function.

От
Tom Lane
Дата:
spshealy@yahoo.com writes:
> I have written a postgres C function that
> uses a popen linux system call. Orginally when I first tried it I kept
> getting an ECHILD.  I read a little bit more on the pclose function
> and the wait system calls and discoverd that on LINUX if the signal
> handler for  SIGCHLD is set to SIG_IGN you will get the ECHILD error
> on pclose(or wait4 for that matter).  So I did some snooping around in
> the postgres backend code and found that in the traffic cop that the
> SIGCHLD signal handler is set to SIG_IGN.  So in my C function right
> before the popen call I set the signal handler for SIGCHLD to SIG_DFL
> and right after the pclose I set it back to SIG_IGN.  I tested this
> and it seems to solve my problem.

Hmm.  A possibly related bit of ugliness can be found in
src/backend/commands/dbcommands.c, where we ignore ECHILD after
a system() call:
   ret = system(buf);   /* Some versions of SunOS seem to return ECHILD after a system() call */   if (ret != 0 &&
errno!= ECHILD)   {
 

Interesting, no?  I wonder whether we could get rid of that kluge
if the signal handler was SIG_DFL rather than SIG_IGN.  Can anyone
try this on one of the affected versions of SunOS?  (Tatsuo, you
seem to have added the ECHILD exception on May 25 2000; the commit
message mentions Solaris but not which version.  Could you try it?)

What I'd be inclined to do, rather than swapping the handlers around
while running, is to just have backend startup (tcop/postgres.c) set
the handler to SIG_DFL not SIG_IGN in the first place.  That *should*
produce the identical results according to my man pages, but evidently
it's not quite the same thing on some systems.

Changing this might be a zero-cost solution to a portability glitch.
Comments anyone?
        regards, tom lane


Re: SIGCHLD handler in Postgres C function.

От
Tatsuo Ishii
Дата:
> spshealy@yahoo.com writes:
> > I have written a postgres C function that
> > uses a popen linux system call. Orginally when I first tried it I kept
> > getting an ECHILD.  I read a little bit more on the pclose function
> > and the wait system calls and discoverd that on LINUX if the signal
> > handler for  SIGCHLD is set to SIG_IGN you will get the ECHILD error
> > on pclose(or wait4 for that matter).  So I did some snooping around in
> > the postgres backend code and found that in the traffic cop that the
> > SIGCHLD signal handler is set to SIG_IGN.  So in my C function right
> > before the popen call I set the signal handler for SIGCHLD to SIG_DFL
> > and right after the pclose I set it back to SIG_IGN.  I tested this
> > and it seems to solve my problem.
> 
> Hmm.  A possibly related bit of ugliness can be found in
> src/backend/commands/dbcommands.c, where we ignore ECHILD after
> a system() call:
> 
>     ret = system(buf);
>     /* Some versions of SunOS seem to return ECHILD after a system() call */
>     if (ret != 0 && errno != ECHILD)
>     {
> 
> Interesting, no?  I wonder whether we could get rid of that kluge
> if the signal handler was SIG_DFL rather than SIG_IGN.  Can anyone
> try this on one of the affected versions of SunOS?  (Tatsuo, you
> seem to have added the ECHILD exception on May 25 2000; the commit
> message mentions Solaris but not which version.  Could you try it?)

It was Solaris 2.6.

>Subject: [HACKERS] Solaris 2.6 problems
>From: Tatsuo Ishii <t-ishii@sra.co.jp>
>To: pgsql-hackers@postgresql.org
>Cc: t-ishii@sra.co.jp
>Date: Wed, 24 May 2000 18:28:25 +0900
>X-Mailer: Mew version 1.93 on Emacs 19.34 / Mule 2.3 (SUETSUMUHANA)
>
>Hi, I have encountered a really strange problem with PostgreSQL 7.0 on
>Solaris 2.6/Sparc. The problem is that createdb command or create
>database SQL always fails. Inspecting the output of truss shows that
>system() call in createdb() (commands/dbcomand.c) fails because
>waitid() system call in system() returns error no. 10 (ECHILD).
>
>This problem was not in 6.5.3, so I checked the source of it. The
>reason why 6.5.3's createdb worked was that it just ignored the return
>code of system()!
>
>It seems that we need to ignore an error from system() if the error is
>ECHILD on Solaris.
>
>Any idea?
>
>BTW, I have compiled PostgreSQL with egcs 2.95 with/without
>optimization.
>--
>Tatsuo Ishii
>


Re: SIGCHLD handler in Postgres C function.

От
Bill Studenmund
Дата:
On Sun, 22 Jul 2001, Tatsuo Ishii wrote:

> > spshealy@yahoo.com writes:
> > > I have written a postgres C function that
> > > uses a popen linux system call. Orginally when I first tried it I kept
> > > getting an ECHILD.  I read a little bit more on the pclose function
> > > and the wait system calls and discoverd that on LINUX if the signal
> > > handler for  SIGCHLD is set to SIG_IGN you will get the ECHILD error
> > > on pclose(or wait4 for that matter).  So I did some snooping around in
> > > the postgres backend code and found that in the traffic cop that the
> > > SIGCHLD signal handler is set to SIG_IGN.  So in my C function right
> > > before the popen call I set the signal handler for SIGCHLD to SIG_DFL
> > > and right after the pclose I set it back to SIG_IGN.  I tested this
> > > and it seems to solve my problem.

Just ignore ECHILD. It's not messy at all. :-) It sounds like your kernel
is using SIG_IGN to do the same thing as the SA_NOCLDWAIT flag in *BSD
(well NetBSD at least). When a child dies, it gets re-parrented to init
(which is wait()ing). init does the child-died cleanup, rather than the
parent needing to. That way when the parent runs wait(), there is no
child, so you get an ECHILD.

All ECHILD is doing is saying there was no child. Since we aren't really
waiting for the child, I don't see how that's a problem.

Take care,

Bill



Re: SIGCHLD handler in Postgres C function.

От
Tom Lane
Дата:
Bill Studenmund <wrstuden@zembu.com> writes:
> All ECHILD is doing is saying there was no child. Since we aren't really
> waiting for the child, I don't see how that's a problem.

You're missing the point: on some platforms the system() call is
returning a failure indication because of ECHILD.  It's system() that's
broken, not us, and the issue is how to work around its brokenness
without sacrificing more error detection than we have to.
        regards, tom lane


Re: SIGCHLD handler in Postgres C function.

От
Bill Studenmund
Дата:
On Mon, 30 Jul 2001, Tom Lane wrote:

> Bill Studenmund <wrstuden@zembu.com> writes:
> > All ECHILD is doing is saying there was no child. Since we aren't really
> > waiting for the child, I don't see how that's a problem.
>
> You're missing the point: on some platforms the system() call is
> returning a failure indication because of ECHILD.  It's system() that's
> broken, not us, and the issue is how to work around its brokenness
> without sacrificing more error detection than we have to.

I think I do get the point. But perhaps I didn't make my point well. :-)

I think the problem is that on some OSs, setting SIGCHLD to SIG_IGN
actually triggers automatic child reaping. So the problem is that we are:
1) setting SIGCHLD to SIG_IGN, 2) Calling system(), and 3) thinking ECHILD
means something was really wrong.

I think 4.4BSD systems will do what we expect (as the NO_CHLDWAIT flag
requests child reaping), but linux systems will give us the ECHILD.
Looking at source on the web, I found:

kernel/signal.c:1042

* Note the silly behaviour of SIGCHLD: SIG_IGN means that the
* signal isn't actually ignored, but does automatic child
* reaping, while SIG_DFL is explicitly said by POSIX to force
* the signal to be ignored.

So we get automatic reaping on Linux systems (which isn't bad).

If automatic reaping happens, system will give us an ECHILD as the waitpid
(or equivalent) will not have found a child. :-)

My suggestion is just leave the ifs as "if ((error == 0) || (error ==
ECHLD))" (or the inverse).

Take care,

Bill



Re: SIGCHLD handler in Postgres C function.

От
Tom Lane
Дата:
Bill Studenmund <wrstuden@zembu.com> writes:
> Looking at source on the web, I found:

> kernel/signal.c:1042

> * Note the silly behaviour of SIGCHLD: SIG_IGN means that the
> * signal isn't actually ignored, but does automatic child
> * reaping, while SIG_DFL is explicitly said by POSIX to force
> * the signal to be ignored.

Hmm, interesting.  If you'll recall, the start of this thread was a
proposal to change our backends' handling of SIGCHLD from SIG_IGN to
SIG_DFL (and get rid of explicit tests for ECHILD).  I didn't quite see
why changing the handler should make a difference, but above we seem to
have the smoking gun.

Which kernel, and which version, is the above quote from?
        regards, tom lane


Re: SIGCHLD handler in Postgres C function.

От
Bill Studenmund
Дата:
On Mon, 30 Jul 2001, Tom Lane wrote:

> Bill Studenmund <wrstuden@zembu.com> writes:
> > Looking at source on the web, I found:
>
> > kernel/signal.c:1042
>
> > * Note the silly behaviour of SIGCHLD: SIG_IGN means that the
> > * signal isn't actually ignored, but does automatic child
> > * reaping, while SIG_DFL is explicitly said by POSIX to force
> > * the signal to be ignored.
>
> Hmm, interesting.  If you'll recall, the start of this thread was a
> proposal to change our backends' handling of SIGCHLD from SIG_IGN to
> SIG_DFL (and get rid of explicit tests for ECHILD).  I didn't quite see
> why changing the handler should make a difference, but above we seem to
> have the smoking gun.
>
> Which kernel, and which version, is the above quote from?

Linux kernel source, 2.4.3, I think i386 version (though it should be the
same for this bit, it's supposed to be MI). Check out
http://lxr.linux.no/source/

I do recall the reason for the thread. :-) I see three choices:

1) Change back to SIG_DFL for normal behavior. I think this will be fineas we run w/o problem on systems that lack this
behavior.Ifturning off automatic child reaping would cause a problem, we'dhave seen it already on the OSs which don't
automaticallyreapchildren. Will a backend ever fork after it's started?
 

2) Change to DFL around system() and then change back.

3) Realize that ECHILD means that the child was auto-reaped (which is anok think and, I think, will only happen if the
childexited w/oerror).
 

Take care,

Bill



Re: SIGCHLD handler in Postgres C function.

От
Bruce Momjian
Дата:
> Bill Studenmund <wrstuden@zembu.com> writes:
> > Looking at source on the web, I found:
> 
> > kernel/signal.c:1042
> 
> > * Note the silly behaviour of SIGCHLD: SIG_IGN means that the
> > * signal isn't actually ignored, but does automatic child
> > * reaping, while SIG_DFL is explicitly said by POSIX to force
> > * the signal to be ignored.
> 
> Hmm, interesting.  If you'll recall, the start of this thread was a
> proposal to change our backends' handling of SIGCHLD from SIG_IGN to
> SIG_DFL (and get rid of explicit tests for ECHILD).  I didn't quite see
> why changing the handler should make a difference, but above we seem to
> have the smoking gun.
> 
> Which kernel, and which version, is the above quote from?

The auto-reaping is standard SysV behavior, while BSD is really ignore. 
See the Steven's Unix Programming book for more info.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026