Обсуждение: SIGTERM does not stop backend postgres processes immediately
It seems that postgres backend processes built with Cygwin do not react to the SIGTERM signal immediately. Instead, they remain blocked on a recv() call deep under ReadCommand() and don't notice the signal until data comes in over the socket connection and unblocks recv(). This prevents a 'fast' stop of the whole PostgreSQL instance from working correctly. I'm seeing this problem in Cygwin 1.3.1 with cygipc-1.09-2, using PostgreSQL built from source based on a very recent CVS snapshot. This problem sounds similar to one reported in the pgsql-ports list earlier this year [1]. That thread concludes that it's a Cygwin problem, but with no solution yet. Has there been any progress since then? [1] http://postgresql.readysetnet.com/mhonarc/pgsql-ports/2001-01/msg00023.html -- Fred Yankowski fred@OntoSys.com tel: +1.630.879.1312 Principal Consultant www.OntoSys.com fax: +1.630.879.1370 OntoSys, Inc 38W242 Deerpath Rd, Batavia, IL 60510, USA
Fred, On Tue, May 08, 2001 at 02:24:27PM -0500, Fred Yankowski wrote: > This problem sounds similar to one reported in the pgsql-ports list > earlier this year [1]. That thread concludes that it's a Cygwin > problem, but with no solution yet. Has there been any progress since > then? > > [1] http://postgresql.readysetnet.com/mhonarc/pgsql-ports/2001-01/msg00023.html Sorry for the dangling thread -- the discussion was moved over to the cygwin-developers list: http://www.cygwin.com/ml/cygwin-developers/2001-02/msg00019.html So, AFAICT the problem in [1] has been "solved." However, I have not built PostgreSQL with Cygwin 1.3.1 -- I have only run it against Cygwin 1.3.1. What happens when you run make check? Does the postmaster exit cleanly at the end of the regression test as expected? Or, does it hang? Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com
I just ran 'make check' for postgres and all 76 tests passed. The problem I'm seeing, where a postgres backend process doesn't react immediately to SIGTERM, occurs even when there is only one such backend process, so this may be a different problem from the one described in those earlier threads and recently fixed in CVS. I'm seeing this problem as I test my patch for running postgres as an NT service. But I just tried running postmaster directly from the shell and I see the same problem. Here's a scenario. BASH WINDOW 1 | BASH WINDOW 2 | | BASH WINDOW 3 v v v postmaster -i -D /usr/local/pgsql/data.test/ -d 1 ### database comes up to "production state" psql -h localhost template1 ### starts up OK and prompts for a command ps -ef ### 2 postgres processes (one is actually the ### postmaster) and 1 psql process pg_ctl -D /usr/local/pgsql/data.test/ -m fast stop ### reports "waiting" and many dots ### "Fast Shutdown request" message appears ### times out and reports "failed" [nothing more happens (which is the problem to be solved) until I do ...] \d ### [Any command to the backend would do.] ### "connection terminated" message appears ### "database system is shut down" appears. ps -ef ### the postgres processes are gone. I know from inserting printfs into the backend code that the SIGTERM signal handler function is not being called right after the stop request. Rather, it is called only after the backend gets some data over its input socket connection, from that "\d" in did in pg_ctl in this case. It seems that the recv() call deep in the backend code does not get interrupted by the SIGTERM. On Tue, May 08, 2001 at 10:05:19PM -0400, Jason Tishler wrote: > However, I have not built PostgreSQL with Cygwin 1.3.1 -- I have only run > it against Cygwin 1.3.1. What happens when you run make check? Does the > postmaster exit cleanly at the end of the regression test as expected? I'm a little confused about the distinction you're making between "Cygwin 1.3.1" and "Cygwin 1.3.1". ;-) Anyway, "make check" completes without any errors. No apparent hangs. -- Fred Yankowski fred@OntoSys.com tel: +1.630.879.1312 Principal Consultant www.OntoSys.com fax: +1.630.879.1370 OntoSys, Inc 38W242 Deerpath Rd, Batavia, IL 60510, USA
Fred, On Wed, May 09, 2001 at 09:40:31AM -0500, Fred Yankowski wrote: > The problem I'm seeing, where a postgres backend process doesn't react > immediately to SIGTERM, occurs even when there is only one such > backend process, so this may be a different problem from the one > described in those earlier threads and recently fixed in CVS. This is my assessment too. > I'm seeing this problem as I test my patch for running postgres as an > NT service. But I just tried running postmaster directly from the > shell and I see the same problem. I was able to reproduce your finding under Cygwin too. When I repeated the experiment under Linux, postmaster shutdown as expected. > I know from inserting printfs into the backend code that the SIGTERM > signal handler function is not being called right after the stop > request. Rather, it is called only after the backend gets some data > over its input socket connection, from that "\d" in did in pg_ctl in > this case. It seems that the recv() call deep in the backend code > does not get interrupted by the SIGTERM. IMO, you have found a Cygwin bug. Please report it to the Cygwin list. Hopefully, Mr. Signal is listening and will jump into action... Can you produce a minimal test case that demonstrates the problem? > On Tue, May 08, 2001 at 10:05:19PM -0400, Jason Tishler wrote: > > However, I have not built PostgreSQL with Cygwin 1.3.1 -- I have only run > > it against Cygwin 1.3.1. What happens when you run make check? Does the > > postmaster exit cleanly at the end of the regression test as expected? > > I'm a little confused about the distinction you're making between > "Cygwin 1.3.1" and "Cygwin 1.3.1". ;-) Sorry, for being unclear. What I was trying to say was that my builds of PostgreSQL are really against Cygwin 1.1.8 (with only cygwin1.dll replaced to workaround the mmap/fork problem). I have never built against Cygwin 1.3.1. However, I do run against Cygwin 1.3.1 on one of my test machines. > Anyway, "make check" completes without any errors. No apparent hangs. Which again confirms that this is a different and yet to be solved problem. Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com
On Wed, May 09, 2001 at 02:26:29PM -0400, Jason Tishler wrote: >> I know from inserting printfs into the backend code that the SIGTERM >> signal handler function is not being called right after the stop >> request. Rather, it is called only after the backend gets some data >> over its input socket connection, from that "\d" in did in pg_ctl in >> this case. It seems that the recv() call deep in the backend code >> does not get interrupted by the SIGTERM. > >IMO, you have found a Cygwin bug. Please report it to the Cygwin list. >Hopefully, Mr. Signal is listening and will jump into action... Unfortunately, blocking recv() calls are not interruptible on Windows. I'm not aware of any mechanism for allowing this. cgf
Christopher Faylor wrote: > > On Wed, May 09, 2001 at 02:26:29PM -0400, Jason Tishler wrote: > >> I know from inserting printfs into the backend code that the SIGTERM > >> signal handler function is not being called right after the stop > >> request. Rather, it is called only after the backend gets some data > >> over its input socket connection, from that "\d" in did in pg_ctl in > >> this case. It seems that the recv() call deep in the backend code > >> does not get interrupted by the SIGTERM. > > How about inserting a select() call before the recv() ? Cygwin's select() is interruptible AFAIK. regards, Hiroshi Inoue
Hiroshi Inoue wrote: > > Christopher Faylor wrote: > > > > On Wed, May 09, 2001 at 02:26:29PM -0400, Jason Tishler wrote: > > >> I know from inserting printfs into the backend code that the SIGTERM > > >> signal handler function is not being called right after the stop > > >> request. Rather, it is called only after the backend gets some data > > >> over its input socket connection, from that "\d" in did in pg_ctl in > > >> this case. It seems that the recv() call deep in the backend code > > >> does not get interrupted by the SIGTERM. > > > > > How about inserting a select() call before the recv() ? > Cygwin's select() is interruptible AFAIK. > I see the following reply from Chris in cygwin's archive(I'm not the member). That would be the "workaround" that I kept mentioning previously. It relies on polling and that is a something I'd rather avoid, if possible. My proposal is to pgsql-cygwin not to cygwin from the first. The following is an example. Comments ? regards, Hiroshi Inoue { #ifdef __CYGWIN__ fd_set rmask; int nsocks; FD_ZERO(&rmask); FD_SET(MyProcPort->sock, &rmask); nsocks = MyProcPort->sock + 1; if (select(nsocks, &rmask, (fd_set *) NULL, (fd_set *) NULL, (struct timeval *) NULL) < 0) { if (errno == EINTR) continue; fprintf(stderr, "pq_recvbuf: select() failed: %s\n", strerror(errno)); return EOF; } #endif /* __CYGWIN__ */ r = recv(MyProcPort->sock, PqRecvBuffer + PqRecvLength, PQ_BUFFER_SIZE - PqRecvLength, 0); }
Corrina, On Tue, May 15, 2001 at 11:20:54AM +0200, Corinna Vinschen wrote: > On Fri, May 11, 2001 at 09:09:28AM +1000, Robert Collins wrote: > > Blueskying a concept here: what about cygwin opening all sockets in > > non-blocking mode, and if the app thinks that it is a blocking call wait > > on the socket && on a signal event? > > > > Obviously not trivial to get working right, but > > a) would it work on 95? > > b) thoughts? > > b) I have just applied a patch to Cygwin which uses overlapped IO > together with the Winsock2 calls WSARecv, WSARecvFrom, WSASend > and WSASendTo if available. The new mechanism is interruptable > by signals. If Winsock2 is not available the new implementation > just falls back to using the non-inerruptable Winsock1 calls. > > I would like to ask people to test it especially in conjunction > with PostgreSQL, which I haven't set up. I just tried my Cygwin PostgreSQL 7.1.1 distribution against the latest Cygwin CVS and the above mentioned patch solves the postmaster shutdown problem. Now Cygwin PostgreSQL behaves identical to UNIX PostgreSQL with regard to shutdown: 1. pg_ctl stop (i.e., kill -s SIGTERM) causes postmaster to wait for all clients to disconnect before shutting down. 2. pg_ctl -m fast stop (i.e., kill -s SIGINT) causes postmaster to shutdown immediately (but cleanly) without waiting for all clients to disconnect. Your patch fixed case 2 above and I believe this is the last piece needed by Fred Yankowski to complete his PostgreSQL NT service patch. Thank you very much for this patch -- it is really appreciated. Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com
Hiroshi, On Tue, May 15, 2001 at 10:30:39AM +0900, Hiroshi Inoue wrote: > Hiroshi Inoue wrote: > > Christopher Faylor wrote: > > > On Wed, May 09, 2001 at 02:26:29PM -0400, Jason Tishler wrote: > > > >> I know from inserting printfs into the backend code that the SIGTERM > > > >> signal handler function is not being called right after the stop > > > >> request. Rather, it is called only after the backend gets some data > > > >> over its input socket connection, from that "\d" in did in pg_ctl in > > > >> this case. It seems that the recv() call deep in the backend code > > > >> does not get interrupted by the SIGTERM. > > > > > > > > How about inserting a select() call before the recv() ? > > Cygwin's select() is interruptible AFAIK. > > I see the following reply from Chris in cygwin's archive(I'm not > the member). > > That would be the "workaround" that I kept mentioning previously. > It relies on polling and that is a something I'd rather avoid, if > possible. > > My proposal is to pgsql-cygwin not to cygwin from the first. > The following is an example. > > Comments ? > > [patch snipped] Your patch is no longer needed since Cygwin's recv in now interruptible. See the following for details: http://cygwin.com/ml/cygwin/2001-05/msg00752.html http://cygwin.com/ml/cygwin/2001-05/msg00774.html Although, I do appreciate your efforts trying to come up with a workaround. Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com
On Tue, May 15, 2001 at 10:10:36AM -0400, Jason Tishler wrote: > I just tried my Cygwin PostgreSQL 7.1.1 distribution against the latest > Cygwin CVS and the above mentioned patch solves the postmaster shutdown > problem. Now Cygwin PostgreSQL behaves identical to UNIX PostgreSQL > with regard to shutdown: Wow, this is great! It's a pleasure to see capable Cygwin developers -- Corinna and Jason in particular, along with the others who posted suggested ways to fix the problem -- dig in and solve problems. > Your patch fixed case 2 above and I believe this is the last piece needed > by Fred Yankowski to complete his PostgreSQL NT service patch. I will resume work on that immediately. The other problem I've been facing is how to handle the SIGHUP that Cgywin generates in response to system shutdown. Some quick tests show that simply ignoring (SIG_IGN) the signal works, but that defeats the use of SIGHUP to force the instance to re-read the configuration file. It may be, however, that fixing the recv() problem also fixes the problem where getting a SIGHUP in the midst of stopping PostgreSQL seemed to mess up the PostgreSQL state. I'll check... -- Fred Yankowski fred@OntoSys.com tel: +1.630.879.1312 Principal Consultant www.OntoSys.com fax: +1.630.879.1370 OntoSys, Inc 38W242 Deerpath Rd, Batavia, IL 60510, USA