Обсуждение: Re: postgresql 6.3.2
Sending this to the hackers list to see if anyone knows anything about it. One of our developers had the problem too, but I thought it was fixed. > > Hello, > > Sorry to send you this message directly to you, but I'm currently not > suscribed to any psql mailing list and this is the only way I thought to > know if my email has been received. > I've sent you some messages in the past regarding the Linux fflush() problem, > the one that made psql receive a EPIPE signal. > I've downloaded the latest available version 6.3.2 and has the same > problem. > If you don't remember, there was a fflush() called in fe-misc.c, exactly > in pqPuts() function that receives EPIPE and it's currently not ignored. > I don't know if this is Linux specific or not, but it'd be great to fix > the problem in the distribution. Do you agree? -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > I've sent you some messages in the past regarding the Linux fflush() > > problem, the one that made psql receive a EPIPE signal. > > I've downloaded the latest available version 6.3.2 and has the > > same problem. > > If you don't remember, there was a fflush() called in fe-misc.c, > > exactly in pqPuts() function that receives EPIPE and it's currently > > not ignored. > > I don't know if this is Linux specific or not, but it'd be great to > > fix the problem in the distribution. Do you agree? Yes, I recall the "broken pipe" problem and thought that someone had fixed it (most platforms didn't seem to see the problem, but Linux did). I'm not currently running v6.3.2, having rev-locked on 980408 to get some development done for v6.4. Did you supply a patch to fix the problem earlier? - Tom
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: >>>> If you don't remember, there was a fflush() called in fe-misc.c, >>>> exactly in pqPuts() function that receives EPIPE and it's currently >>>> not ignored. > Yes, I recall the "broken pipe" problem and thought that someone had > fixed it (most platforms didn't seem to see the problem, but Linux did). fe-connect.c is set up to ignore SIGPIPE while trying to shut down the connection. (The 6.3.2 release is broken on non-POSIX-signal platforms, because it resets SIGPIPE to SIG_DFL afterwards, which may well not be what the application wanted. I've fixed that in the version that I plan to submit soon.) There is no equivalent code to ignore SIGPIPE during ordinary writes to the backend. I'm hesitant to add it on the following grounds: 1. Speed: a write would need three kernel calls instead of one. 2. I'm suspicious of code that alters signal settings during normal operation. Especially in a library that can't know what else is going on in the application. Disabling the application's signal handler, even for short intervals, is best avoided. 3. It's only an issue if the backend crashes, which shouldn't happen anyway ... shouldn't happen anyway ... shouldn't ... ;-) The real question is what scenario is causing SIGPIPE to be delivered in the first place. A search of the pgsql-hackers archives for "SIGPIPE" yields only a mention of seeing SIGPIPE some of the time (not always) when trying to connect to a nonexistent database. If that's what's being complained of here, I'll try to look into it. regards, tom lane
> > Yes, I recall the "broken pipe" problem and thought that someone had > > fixed it (most platforms didn't seem to see the problem, but Linux > > did). > fe-connect.c is set up to ignore SIGPIPE while trying to shut down the > connection. (The 6.3.2 release is broken on non-POSIX-signal > platforms, because it resets SIGPIPE to SIG_DFL afterwards, which may > well not be what the application wanted. I've fixed that in the > version that I plan to submit soon.) > There is no equivalent code to ignore SIGPIPE during ordinary writes > to the backend. I'm hesitant to add it on the following grounds: > 1. Speed: a write would need three kernel calls instead of one. > 2. I'm suspicious of code that alters signal settings during normal > operation. Especially in a library that can't know what else is > going on in the application. Disabling the application's signal > handler, even for short intervals, is best avoided. > 3. It's only an issue if the backend crashes, which shouldn't happen > anyway ... shouldn't happen anyway ... shouldn't ... ;-) > The real question is what scenario is causing SIGPIPE to be delivered > in the first place. A search of the pgsql-hackers archives for > "SIGPIPE" yields only a mention of seeing SIGPIPE some of the time > (not always) when trying to connect to a nonexistent database. > If that's what's being complained of here, I'll try to look into it. golem$ psql nada Connection to database 'nada' failed. FATAL 1: Database nada does not exist in pg_database golem$ psql nada Broken pipe golem$ This is on a Linux box with Postgres code frozen on 980408. I assume that full v6.3.2 exhibits the same... - Tom
I said: > The real question is what scenario is causing SIGPIPE to be delivered > in the first place. A search of the pgsql-hackers archives for > "SIGPIPE" yields only a mention of seeing SIGPIPE some of the time > (not always) when trying to connect to a nonexistent database. OK, I've been able to reproduce this; I understand the problem and I have a proposed fix. Here's the scenario. On the server side, this happens: Postmaster receives new connection request from client (possible authentication cycle here) Postmaster sends "AUTH OK" to client Postmaster forks backend Backend discovers that database name is invalid Backend sends error message Backend closes connection and exits Meanwhile, once the client receives the "AUTH OK" it initiates an empty query cycle (which is commented as intending to discover whether the database exists!): ... Client receives "AUTH_OK" Client sends "Q " query Client waits for response The problem, of course, is that if the backend manages to exit before the client gets to send its empty query, then the client is writing on a closed connection. Boom, SIGPIPE. I thought about hacking around this by having the postmaster check the validity of the database name before it does the authorization cycle. But that's a bad idea; first because it allows unauthorized users to probe the validity of database names, and second because it only fixes this particular instance of the problem. The general problem is that the FE/BE protocol does not make provision for errors reported by the backend during startup. ISTM there are many ways in which the BE might fail during startup, not all of which could reasonably be checked in advance by the postmaster. So ... since we're altering the protocol anyway ... the right fix is to alter the protocol a little more. Remember that "Z" message that the backend is now sending at the end of every query cycle? What we ought to do is make the BE send "Z" at completion of startup, as well. (In other words, "Z" will really mean "Ready for Query" rather than "Query Done". This is actually easier to implement in postgres.c than the other way.) Now the client's startup procedure looks like ... Client receives "AUTH_OK" Client waits for "Z" ; if get "E" instead, BE startup failed. I suspect it's not really necessary to do an empty query after this, but we may as well leave that in there for additional reliability. regards, tom lane
> Client receives "AUTH_OK" > > Client waits for "Z" ; if get "E" instead, BE startup failed. > > I suspect it's not really necessary to do an empty query after this, > but we may as well leave that in there for additional reliability. I say go without the extra query. We can always add it later if there is a problem. Backend startup time should be as fast as possible, especially for short requests like www. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> I suspect it's not really necessary to do an empty query after this, >> but we may as well leave that in there for additional reliability. > I say go without the extra query. We can always add it later if there > is a problem. Backend startup time should be as fast as possible, Good point. OK, I'll leave the empty-query code in fe-connect.c, but ifdef it out, and we'll see if anyone has any problems. regards, tom lane
> > > I've sent you some messages in the past regarding the Linux fflush() > > > problem, the one that made psql receive a EPIPE signal. > > > I've downloaded the latest available version 6.3.2 and has the > > > same problem. > > > If you don't remember, there was a fflush() called in fe-misc.c, > > > exactly in pqPuts() function that receives EPIPE and it's currently > > > not ignored. > > > I don't know if this is Linux specific or not, but it'd be great to > > > fix the problem in the distribution. Do you agree? > > Yes, I recall the "broken pipe" problem and thought that someone had > fixed it (most platforms didn't seem to see the problem, but Linux did). > > I'm not currently running v6.3.2, having rev-locked on 980408 to get > some development done for v6.4. Did you supply a patch to fix the > problem earlier? > Well, kind of. I've tracked the problem down to PQexec() and suggested some way to fix it. I've checked the current psql and the problem is when it calls to PQconnectdb(). At this point EPIPE isn't ignored. Inside PQconnectdb() there is a call to PQexec() to see if the database exists. Then PQexec() calls pqPuts() and you get the broken pipe. Before the call to PQconnectdb in psql there isn't any call to pqsignal. Federico Schwindt
> So ... since we're altering the protocol anyway ... the right fix is > to alter the protocol a little more. Remember that "Z" message that > the backend is now sending at the end of every query cycle? What > we ought to do is make the BE send "Z" at completion of startup, > as well. (In other words, "Z" will really mean "Ready for Query" > rather than "Query Done". This is actually easier to implement in > postgres.c than the other way.) Now the client's startup procedure > looks like > > ... > > Client receives "AUTH_OK" > > Client waits for "Z" ; if get "E" instead, BE startup failed. BE fails, client gets SIGPIPE? or client waits forever? -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software 300 Lakeside Drive Oakland, CA 94612 - A child of five could understand this! Fetch me a child of five.
On Mon, 4 May 1998, Tom Lane wrote: [snip] > Meanwhile, once the client receives the "AUTH OK" it initiates > an empty query cycle (which is commented as intending to discover > whether the database exists!): > > ... > > Client receives "AUTH_OK" > > Client sends "Q " query > > Client waits for response > > The problem, of course, is that if the backend manages to exit > before the client gets to send its empty query, then the client > is writing on a closed connection. Boom, SIGPIPE. [snip] > So ... since we're altering the protocol anyway ... the right fix is > to alter the protocol a little more. Remember that "Z" message that > the backend is now sending at the end of every query cycle? What > we ought to do is make the BE send "Z" at completion of startup, > as well. (In other words, "Z" will really mean "Ready for Query" > rather than "Query Done". This is actually easier to implement in > postgres.c than the other way.) Now the client's startup procedure > looks like > > ... > > Client receives "AUTH_OK" > > Client waits for "Z" ; if get "E" instead, BE startup failed. This sounds fair enough. Infact we could then throw a more meaningful error message when this occurs. > I suspect it's not really necessary to do an empty query after this, > but we may as well leave that in there for additional reliability. In JDBC, I replaced (back in 6.2) the empty query with one to get the current DateStyle (which the driver then uses to handle dates correctly). -- Peter T Mount peter@retep.org.uk or petermount@earthling.net Main Homepage: http://www.retep.org.uk ************ Someday I may rebuild this signature completely ;-) ************ Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk
dg@illustra.com (David Gould) writes: >> So ... since we're altering the protocol anyway ... the right fix is >> to alter the protocol a little more. >> >> Client waits for "Z" ; if get "E" instead, BE startup failed. > BE fails, client gets SIGPIPE? or client waits forever? Neither: the client detects EOF on its input and realizes that the backend failed. Already done and tested. (SIGPIPE is only for *write* on a closed channel, not *read*. Read just returns EOF.) regards, tom lane