Обсуждение: Rare SSL failures on eelpout
(Moving this discussion from the pgsql-committers thread "pgsql: Update ssl test certificates and keys", which is innocent.) On Wed, Jan 16, 2019 at 10:37 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Jan 4, 2019 at 10:08 AM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > Thanks. FWIW I've just updated eelpout (a Debian testing BF animal > > that runs all the extra tests including SSL) to use libssl-dev > > (instead of libssl1.0-dev), and cleared its accache files. Let's see > > if that works... > > Since that upgrade (to libssl 1.1.1a-1), there are have been a few > intermittent failures in the SSL tests on that animal (thanks Tom for > pointing that out off-list). I reproduced this manually. From time to time (less than 1% of the times I tried), the server hangs up without sending a goodbye message, thereby causing a failure to match the expected psql error message. From the kernel's perspective on the psql side, a working-as-expected case looks like this: sendto(3, "\27\3\3\2\356\313\334\372\201\353\201h\204\353\240A\4\355\232\340\375\0X\220\326V\374\253\222i\2724"..., 1115, MSG_NOSIGNAL, NULL, 0) = 1115 ppoll([{fd=3, events=POLLOUT|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLOUT}]) sendto(3, "\27\3\3\0f\335\313\224\263\256r\371\215\177\273,N\345\342\200\257\r\321\6\323@\316\241\316\17\204\26"..., 107, MSG_NOSIGNAL, NULL, 0) = 107 ppoll([{fd=3, events=POLLIN|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLIN}]) recvfrom(3, "\27\3\3\0\23", 5, 0, NULL, NULL) = 5 recvfrom(3, "I\"\t;\3006\276\347\344]7>\2\234m\340]B\241", 19, 0, NULL, NULL) = 19 close(3) = 0 write(2, "psql: SSL error: sslv3 alert cer"..., 49psql: SSL error: sslv3 alert certificate revoked ) = 49 ... and the unexpected failure case looks like this: sendto(3, "\27\3\3\2\356/\254\307\277\342G?&BB\321\f\314\245\22\246U\337\263;\203f\302s\306\37\276"..., 1115, MSG_NOSIGNAL, NULL, 0) = 1115 ppoll([{fd=3, events=POLLOUT|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLOUT|POLLERR|POLLHUP}]) sendto(3, "\27\3\3\0f\177\335\20\305\353\234\306\211#\343\321\336\22111J\312\323F\210\6U\331\264GAr"..., 107, MSG_NOSIGNAL, NULL, 0) = -1 ECONNRESET (Connection reset by peer) write(2, "psql: server closed the connecti"..., 199psql: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. could not send startup packet: Connection reset by peer ) = 199 To the kernel on the server side, a good case looks like this: [pid 13740] sendto(8, "\27\3\3\0\23\252\21\227\2\232s\354\21\243\236\207\301\3B\341\253\306k\346", 24, 0, NULL, 0) = 24 [pid 13740] write(2, "2019-01-22 09:23:04.425 UTC [137"..., 1112019-01-22 09:23:04.425 UTC [13740] [unknown] LOG: could not accept SSL connection: certificate verify failed ) = 111 [pid 13740] exit_group(0) = ? [pid 13740] +++ exited with 0 +++ I didn't manage to trace a bad case on the server side (maybe the strace interferes with the required timing), but I noticed that SSL_shutdown()'s return code (which we don't check) is always -1, and I noticed that SSL_get_error() after that is always SSL_ERROR_SSL. I wondered if we might somehow be reaching exit() when the alert response is still buffered inside OpenSSL, but that doesn't seem right -- the port has noblock=0 at that stage. Hmm. Why is psql doing two sendto() calls without reading a response in between, when it's possible for the server to exit after the first, anyway? Seems like a protocol violation somewhere? -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Hmm. Why is psql doing two sendto() calls without reading a response > in between, when it's possible for the server to exit after the first, > anyway? Seems like a protocol violation somewhere? Keep in mind this is all down inside the SSL handshake, so if any protocol is being violated, it's theirs not ours. My gut reaction is that this probably indicates that in the "certificate verify failed" code path, we're exiting the server too soon without letting openssl finish out its handshake fully. But that could be all wet, or even if true it might not be convenient to postpone exit (e.g., we'd have to save the SSL error code somewhere, I suspect). The whole thing reminds me of the recent bug #15598: https://www.postgresql.org/message-id/87k1iy44fd.fsf%40news-spur.riddles.org.uk regards, tom lane
On Wed, Jan 23, 2019 at 4:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Hmm. Why is psql doing two sendto() calls without reading a response > > in between, when it's possible for the server to exit after the first, > > anyway? Seems like a protocol violation somewhere? > > Keep in mind this is all down inside the SSL handshake, so if any > protocol is being violated, it's theirs not ours. The sendto() of 1115 bytes is SSL_connect()'s last syscall, just before it returns 1 to indicate success (even though it wasn't successful?), without waiting for a further response. The sendto() of 107 bytes is our start-up packet, which either succeeds and is followed by reading a "certificate revoked" message from the server, or fails with ECONNRESET if the socket has already been shut down at the server end due to the racing exit. It seems very strange to me that the error report is deferred until we send our start-up packet. It seems like a response that belongs to the connection attempt, not our later data sending. Bug in OpenSSL? Unintended consequence of our switch to blocking IO at that point? I tried to find out how this looked on 1.0.2, but it looks like Debian has just removed the older version from the buster distro and I'm out of time to hunt this on other OSes today. > The whole thing reminds me of the recent bug #15598: > > https://www.postgresql.org/message-id/87k1iy44fd.fsf%40news-spur.riddles.org.uk Yeah, if errors get moved to later exchanges but the server might exit and close its end of the socket before we can manage to initiate a later exchange, it starts to look just like that. A less interesting bug is the appearance of 3 nonsensical "Success" (glibc) or "No error: 0" (FreeBSD) error messages in the server logs on systems running OpenSSL 1.1.1, much like this, which I guess might mean EOF: https://www.postgresql.org/message-id/CAEepm=3cc5wYv=X4Nzy7VOUkdHBiJs9bpLzqtqJWxdDUp5DiPQ@mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
On Wed, Jan 23, 2019 at 11:23 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Jan 23, 2019 at 4:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The whole thing reminds me of the recent bug #15598: > > > > https://www.postgresql.org/message-id/87k1iy44fd.fsf%40news-spur.riddles.org.uk > > Yeah, if errors get moved to later exchanges but the server might exit > and close its end of the socket before we can manage to initiate a > later exchange, it starts to look just like that. Based on some clues from Andrew Gierth (in the email referenced above and also in an off-list chat), I did some experimentation that seemed to confirm a theory of his that Linux might be taking a shortcut when both sides are local, bypassing the RST step because it can see both ends (whereas normally the TCP stack should cause the *next* sendto() to fail IIUC?). I think this case is essentially the same as bug #15598, it's just happening at a different time. With a simple socket test program I can see that if you send a single packet after the remote end has closed and after it had already read everything you sent it up to now, you get EPIPE. If there was some outstanding data from a previous send that it hadn't read yet when it closed its end, you get ECONNRESET. This doesn't happen if client and server are on different machines, or on FreeBSD even on the same machine, but does happen if client and server are on the same Linux system (whether using the loopback interface or a real network interface). However, after you get ECONNRESET, you can still read the final data that was sent by the server before it closed, which presumably contains the error we want to report to the user. That suggests that we could perhaps handle ECONNRESET both at startup packet send time (for certificate rejection, eelpout's case) and at initial query send (for idle timeout, bug #15598's case) by attempting to read. Does that make sense? I haven't poked into the libpq state machine stuff to see if that would be easy or hard. PS: looking again at the strace output from earlier, it's kinda funny that it says revents=POLLOUT|POLLERR|POLLHUP, since that seems to be a contradiction: if this were poll() and not ppoll() I think it might violate POSIX which says "[POLLHUP] and POLLOUT are mutually-exclusive; a stream can never be writable if a hangup has occurred", but I don't see what we could do differently with that anyway. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > With a simple socket test program I can see that if you send a single > packet after the remote end has closed and after it had already read > everything you sent it up to now, you get EPIPE. If there was some > outstanding data from a previous send that it hadn't read yet when it > closed its end, you get ECONNRESET. This doesn't happen if client and > server are on different machines, or on FreeBSD even on the same > machine, but does happen if client and server are on the same Linux > system (whether using the loopback interface or a real network > interface). However, after you get ECONNRESET, you can still read the > final data that was sent by the server before it closed, which > presumably contains the error we want to report to the user. That > suggests that we could perhaps handle ECONNRESET both at startup > packet send time (for certificate rejection, eelpout's case) and at > initial query send (for idle timeout, bug #15598's case) by attempting > to read. Does that make sense? Hmm ... it definitely makes sense that we shouldn't assume that a *write* failure means there is nothing left to *read*. I would say that's generally true regardless of the specific errno. How about the opposite case, btw --- should we continue to try to flush data after a read error? Or is it safe to assume that means the other end is gone? > I haven't poked into the libpq state > machine stuff to see if that would be easy or hard. Me either. regards, tom lane
I wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> That suggests that we could perhaps handle ECONNRESET both at startup >> packet send time (for certificate rejection, eelpout's case) and at >> initial query send (for idle timeout, bug #15598's case) by attempting >> to read. Does that make sense? > Hmm ... it definitely makes sense that we shouldn't assume that a *write* > failure means there is nothing left to *read*. After staring at the code for awhile, I am thinking that there may be a bug of that ilk, but if so it's down inside OpenSSL. Perhaps it's specific to the OpenSSL version you're using on eelpout? There is not anything we could do differently in libpq, AFAICS, because it's OpenSSL's responsibility to read any data that might be available. I also looked into the idea that we're doing something wrong on the server side, allowing the final error message to not get flushed out. A plausible theory there is that SSL_shutdown is returning a WANT_READ or WANT_WRITE error and we should retry it ... but that doesn't square with your observation upthread that it's returning SSL_ERROR_SSL. It's all very confusing, but I think there's a nontrivial chance that this is an OpenSSL bug, especially since we haven't been able to replicate it elsewhere. regards, tom lane
On Tue, Mar 5, 2019 at 10:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > >> That suggests that we could perhaps handle ECONNRESET both at startup > >> packet send time (for certificate rejection, eelpout's case) and at > >> initial query send (for idle timeout, bug #15598's case) by attempting > >> to read. Does that make sense? > > > Hmm ... it definitely makes sense that we shouldn't assume that a *write* > > failure means there is nothing left to *read*. > > After staring at the code for awhile, I am thinking that there may be > a bug of that ilk, but if so it's down inside OpenSSL. Perhaps it's > specific to the OpenSSL version you're using on eelpout? There is > not anything we could do differently in libpq, AFAICS, because it's > OpenSSL's responsibility to read any data that might be available. > > I also looked into the idea that we're doing something wrong on the > server side, allowing the final error message to not get flushed out. > A plausible theory there is that SSL_shutdown is returning a WANT_READ > or WANT_WRITE error and we should retry it ... but that doesn't square > with your observation upthread that it's returning SSL_ERROR_SSL. > > It's all very confusing, but I think there's a nontrivial chance > that this is an OpenSSL bug, especially since we haven't been able > to replicate it elsewhere. Hmm. Yes, it is strange that we haven't seen it elsewhere, but remember that very few animals are running the ssl tests; also it requires particular timing to hit. OK, here's something. I can reproduce it quite easily on this machine, and I can fix it like this: diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index f29202db5f..e9c137f1bd 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -2705,6 +2705,7 @@ keep_going: /* We will come back to here until there is libpq_gettext("could not send startup packet: %s\n"), SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf))); free(startpacket); + pqHandleSendFailure(conn); goto error_return; } If I add some printf debugging in there, I can see that block being reached every hundred or so times I try to connect with a revoked certificate, and with that extra call to pqHandleSendFailure() in there the error comes out as it should: psql: SSL error: sslv3 alert certificate revoked Now I'm confused because we already have handling like that in PQsendQuery(), so I can't explain bug #15598. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Mar 5, 2019 at 10:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's all very confusing, but I think there's a nontrivial chance >> that this is an OpenSSL bug, especially since we haven't been able >> to replicate it elsewhere. > Hmm. Yes, it is strange that we haven't seen it elsewhere, but > remember that very few animals are running the ssl tests; also it > requires particular timing to hit. True. I've spent some time today running the ssl tests on various machines here, without any luck reproducing. > OK, here's something. I can reproduce it quite easily on this > machine, and I can fix it like this: > libpq_gettext("could not send startup packet: %s\n"), > SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf))); > free(startpacket); > + pqHandleSendFailure(conn); > goto error_return; > } Yeah. But I don't like pqHandleSendFailure all that much: it has strong constraints on what state libpq has to be in, as a consequence of which it's called from a bunch of ad-hoc places, and can't be called from some others. It's kind of accidental that it will work here. I was toying with the idea of getting rid of that function in favor of a design like this: * Handle any socket-write failure at some low level of libpq by recording the fact that the error occurred (plus whatever data we need to produce a message about it), and then starting to just discard output data. * Eventually, we'll try to read. Process any available input data as usual (and, if we get a read error, report that). When no more input data is available, if a socket write failure has been recorded, report that much as if it were an incoming ERROR message, and then shut down the connection. This would essentially give us pqHandleSendFailure-like behavior across the board, which might be enough to fix this problem as well as bug #15598. Or not ... as you say, we haven't thoroughly understood either issue, so it's possible this wouldn't do it. This would have the effect of prioritizing reports of socket read failures over those of write failures, which is an interesting behavioral change, but offhand I can't see a problem with it. One thing that isn't real clear to me is how much timing sensitivity there is in "when no more input data is available". Can we assume that if we've gotten ECONNRESET or an allied error from a write, then any data the far end might've wished to send us is already available to read? In principle, since TCP allows shutting down either transmission direction independently, the server could close the read side of its socket while continuing to transmit data. In practice, PG servers don't do that; but could network timing glitches create the same effect? Even if it's possible, does it happen enough to worry about? The reason I'm concerned is that I don't think it'd be bright to ignore a send error until we see input EOF, which'd be the obvious way to solve a timing problem if there is one. If our send error was some transient glitch and the connection is really still open, then we might not get EOF, but we won't get a server reply either because no message went to the server. You could imagine waiting some small interval of time before deciding that it's time to report the write failure, but ugh. In any case, consuming immediately-available data before reporting the write error would already be a step forward over what we're doing now, I think. Thoughts? regards, tom lane
On Tue, Mar 5, 2019 at 11:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > OK, here's something. I can reproduce it quite easily on this > > machine, and I can fix it like this: > > > libpq_gettext("could not send startup packet: %s\n"), > > SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf))); > > free(startpacket); > > + pqHandleSendFailure(conn); > > goto error_return; > > } > > Yeah. But I don't like pqHandleSendFailure all that much: it has strong > constraints on what state libpq has to be in, as a consequence of which > it's called from a bunch of ad-hoc places, and can't be called from > some others. It's kind of accidental that it will work here. > > I was toying with the idea of getting rid of that function in > favor of a design like this: > > * Handle any socket-write failure at some low level of libpq by > recording the fact that the error occurred (plus whatever data we > need to produce a message about it), and then starting to just > discard output data. > > * Eventually, we'll try to read. Process any available input data > as usual (and, if we get a read error, report that). When no more > input data is available, if a socket write failure has been recorded, > report that much as if it were an incoming ERROR message, and then > shut down the connection. > > This would essentially give us pqHandleSendFailure-like behavior > across the board, which might be enough to fix this problem as well as > bug #15598. Or not ... as you say, we haven't thoroughly understood > either issue, so it's possible this wouldn't do it. > > This would have the effect of prioritizing reports of socket read > failures over those of write failures, which is an interesting > behavioral change, but offhand I can't see a problem with it. That seems to recreate (and extend) the behaviour that we see on another TCP stacks, or on Linux with a remote connection, namely that the first sendto() succeeds (even though future sendto() calls may fail, we don't usually get to that because we read and discover an application-level error message or whatever). > One thing that isn't real clear to me is how much timing sensitivity > there is in "when no more input data is available". Can we assume that > if we've gotten ECONNRESET or an allied error from a write, then any > data the far end might've wished to send us is already available to > read? In principle, since TCP allows shutting down either transmission > direction independently, the server could close the read side of its > socket while continuing to transmit data. In practice, PG servers > don't do that; but could network timing glitches create the same effect? > Even if it's possible, does it happen enough to worry about? That is beyond my knowledge of networking; I have CC'd Andrew Gierth in case he has something to say about that. It certainly seems sensible to assume that eg RST must follow any data that the other end sent before closing, if it did indeed call close(), and in our case we know that it never calls shutdown(SHUT_RD), so the only other possibility seems to be that it crashed or the connection was lost. So it seems safe to assume that we can read the server's parting words after we've had a EPIPE/ECONNRESET error. Based on some quick googling, it looks like SHUT_RD doesn't actually send anything anyway (unlike SHUT_WR which sends FIN and would result in EOF at the other end), so I'm not sure if SHUT_RD even "exists" outside the mind of the TCP stack doing it and therefore I'm not sure if there is any "timing glitch" that could resemble it. But I don't know. > The reason I'm concerned is that I don't think it'd be bright to ignore a > send error until we see input EOF, which'd be the obvious way to solve a > timing problem if there is one. If our send error was some transient > glitch and the connection is really still open, then we might not get EOF, > but we won't get a server reply either because no message went to the > server. You could imagine waiting some small interval of time before > deciding that it's time to report the write failure, but ugh. > > In any case, consuming immediately-available data before reporting the > write error would already be a step forward over what we're doing now, > I think. > > Thoughts? It seems reasonable to me on the grounds that sending data doesn't mean the other guy got it anyway and we can see that on other TCP stacks. Preferring errors resulting from reads seems sensible because it's exactly what we want in these cases, and I can't immediately think of a downside. It's bigger surgery than I was thinking of but it seems like it removes some pre-existing complicated code and replaces it with something simple, so that seems to be a win. -- Thomas Munro https://enterprisedb.com
On Tue, Mar 5, 2019 at 11:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > True. I've spent some time today running the ssl tests on various > machines here, without any luck reproducing. BTW, I went looking for other failures on the buildfarm I noticed that even for eelpout it's only happening on master and REL_11_STABLE: https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=200&branch=&stage=sslCheck&filter=Submit Disappointingly, that turned out to be just because 10 and earlier didn't care what the error message said. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > BTW, I went looking for other failures on the buildfarm I noticed that > even for eelpout it's only happening on master and REL_11_STABLE: Yeah, I'd noticed that. > Disappointingly, that turned out to be just because 10 and earlier > didn't care what the error message said. That is, you can reproduce the failure on old branches? That lets out a half-theory I'd had, which was that Andres' changes to make the backend always run its socket in nonblock mode had had something to do with it. (Those changes do represent a plausible reason why SSL_shutdown might be returning WANT_READ/WANT_WRITE; but I'm not in a hurry to add such code without evidence that it actually happens and something useful would change if we retry.) regards, tom lane
On Wed, Mar 6, 2019 at 3:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Disappointingly, that turned out to be just because 10 and earlier > > didn't care what the error message said. > > That is, you can reproduce the failure on old branches? That lets > out a half-theory I'd had, which was that Andres' changes to make > the backend always run its socket in nonblock mode had had something > to do with it. (Those changes do represent a plausible reason why > SSL_shutdown might be returning WANT_READ/WANT_WRITE; but I'm not > in a hurry to add such code without evidence that it actually > happens and something useful would change if we retry.) Yes, on REL_10_STABLE: $ for i in `seq 1 1000 ` ; do psql "host=localhost port=56024 dbname=certdb user=postgres sslcert=ssl/client-revoked.crt sslkey=ssl/client-revoked.key" done psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked ... psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. could not send startup packet: Connection reset by peer psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked Zooming in with strace: sendto(3, "\27\3\3\2\356\r\214\352@\21\320\202\236}\376\367\262\227\177\255\212\204`q\254\108\326\201+c)"..., 1115, MSG_NOSIGNAL, NULL, 0) = 1115 ppoll([{fd=3, events=POLLOUT|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLOUT|POLLERR|POLLHUP}]) sendto(3, "\27\3\3\0cW_\210\337Q\227\360\216k\221\346\372pw\27\325P\203\357\245km\304Rx\355\200"..., 104, MSG_NOSIGNAL, NULL, 0) = -1 ECONNRESET (Connection reset by peer) You can see that poll() already knew the other end had closed the socket. Since this is clearly timing... let's see, yeah, I can make it fail every time by adding sleep(1) before the comment "Send the startup packet.". I assume that'll work on any Linux machine? To set this test up, I ran a server with the following config: ssl=on ssl_ca_file='root+client_ca.crt' ssl_cert_file='server-cn-only.crt' ssl_key_file='server-cn-only.key' ssl_crl_file='root+client.crl' I copied those files out of src/test/ssl/ssl/. Then I ran the psql command shown earlier. I think I had to chmod 600 the keys. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > You can see that poll() already knew the other end had closed the > socket. Since this is clearly timing... let's see, yeah, I can make > it fail every time by adding sleep(1) before the comment "Send the > startup packet.". I assume that'll work on any Linux machine? Great idea, but no cigar --- doesn't do anything for me except make the ssl test really slow. (I tried it on RHEL6 and Fedora 28 and, just for luck, current macOS.) What this seems to prove is that the thing that's different about eelpout is the particular kernel it's running, and that that kernel has some weird timing behavior in this situation. I've also been experimenting with reducing libpq's SO_SNDBUF setting on the socket, with more or less the same idea of making the sending of the startup packet slower. No joy there either. Annoying. I'd be happier about writing code to fix this if I could reproduce it :-( regards, tom lane PS: but now I'm wondering about trying other non-Linux kernels.
On Wed, Mar 6, 2019 at 6:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > You can see that poll() already knew the other end had closed the > > socket. Since this is clearly timing... let's see, yeah, I can make > > it fail every time by adding sleep(1) before the comment "Send the > > startup packet.". I assume that'll work on any Linux machine? > > Great idea, but no cigar --- doesn't do anything for me except make > the ssl test really slow. (I tried it on RHEL6 and Fedora 28 and, just > for luck, current macOS.) What this seems to prove is that the thing > that's different about eelpout is the particular kernel it's running, > and that that kernel has some weird timing behavior in this situation. > > I've also been experimenting with reducing libpq's SO_SNDBUF setting > on the socket, with more or less the same idea of making the sending > of the startup packet slower. No joy there either. > > Annoying. I'd be happier about writing code to fix this if I could > reproduce it :-( Hmm. Note that eelpout only started doing it with OpenSSL 1.1.1. But I just tried the sleep(1) trick on an x86 box running the same version of Debian, OpenSSL etc and it didn't work. So eelpout (a super cheap virtualised 4-core ARMv8 system rented from scaleway.com running Debian Buster with a kernel identifying itself as 4.9.23-std-1 and libc6 2.28-7) is indeed starting to look pretty weird. Let me know if you want to log in and experiment on that machine. -- Thomas Munro https://enterprisedb.com
On Wed, Mar 6, 2019 at 7:05 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Mar 6, 2019 at 6:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Annoying. I'd be happier about writing code to fix this if I could > > reproduce it :-( > > Hmm. Note that eelpout only started doing it with OpenSSL 1.1.1. Bleugh. I think this OpenSSL package might just be buggy on ARM. On x86, apparently the same version of OpenSSL and all other details of the test the same, I can see that SSL_connect() returns <= 0 (failure), and then we ask for that cert revoked message directly and never even reach the startup packet sending code. On ARM, SSL_connect() returns 1 (success) and then we proceed as discussed and eventually get the error later (or not). So I think I should figure out a minimal repro and report this to them. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > Bleugh. I think this OpenSSL package might just be buggy on ARM. On > x86, apparently the same version of OpenSSL and all other details of > the test the same, I can see that SSL_connect() returns <= 0 > (failure), and then we ask for that cert revoked message directly and > never even reach the startup packet sending code. On ARM, > SSL_connect() returns 1 (success) and then we proceed as discussed and > eventually get the error later (or not). So I think I should figure > out a minimal repro and report this to them. Yeah, I've still been unable to reproduce even with the sleep idea, so eelpout is definitely looking like a special snowflake from here. In any case, there seems little doubt that getting past SSL_connect() when the cert check has failed is an OpenSSL bug; I don't feel a need to create a workaround for it. The bug #15598 report is more troublesome, as we don't have a strong reason to believe it's not common on Windows. However, I wonder whether we can really do anything at all about that one. If I understand what Andrew was hypothesizing in that thread, it was that Windows might be dropping undelivered data on the floor once the server closes its end of the connection. That would be totally broken behavior, but I never expect anything else from Microsoft :-(. If that is an accurate theory then rewriting libpq won't fix it. I still think the redesign I suggested upthread would make things cleaner, but I don't have time or interest to make it happen in the near future if it's not fixing an observable bug. regards, tom lane
On Wed, Mar 6, 2019 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The bug #15598 report is more troublesome, as we don't have a strong > reason to believe it's not common on Windows. However, I wonder whether > we can really do anything at all about that one. If I understand what > Andrew was hypothesizing in that thread, it was that Windows might be > dropping undelivered data on the floor once the server closes its end > of the connection. That would be totally broken behavior, but I never > expect anything else from Microsoft :-(. If that is an accurate theory > then rewriting libpq won't fix it. Here is a stupid Python 2.7 program to try to test that. Run one copy of it like this: $ python ./test.py --server The server will wait for a client, send a message immediately, and then close the socket after a second. The client will connect and send something once before and twice after the server closes the socket, and finally see if it can read the message from the server. Here's the output I get from the client on some different systems (the server was running on the same system): $ uname -a Linux debian 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64 GNU/Linux $ python ./test.py --client Sending A... 2 Sending B... [Errno 104] Connection reset by peer Sending C... [Errno 32] Broken pipe This is the server saying goodbye $ uname -a Darwin macaque.local 18.2.0 Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64 x86_64 $ python2.7 ./test.py --client Sending A... 2 Sending B... [Errno 32] Broken pipe Sending C... [Errno 32] Broken pipe This is the server saying goodbye $ uname -a FreeBSD dogmatix 13.0-CURRENT FreeBSD 13.0-CURRENT c0873ea614a(master) GENERIC amd64 $ python2.7 ./test.py --client Sending A... 2 Sending B... 2 Sending C... 2 This is the server saying goodbye So... can anyone tell us what happens on Windows? (A secondary question might be what happens if the server and client are on different machines since I guess it could be different?) -- Thomas Munro https://enterprisedb.com
Вложения
On Wed, Mar 06, 2019 at 11:13:31AM +1300, Thomas Munro wrote: > So... can anyone tell us what happens on Windows? Voila! C:\Users\Shawn Debnath\Desktop>systeminfo OS Name: Microsoft Windows 10 Pro OS Version: 10.0.17763 N/A Build 17763 [...] C:\Users\Shawn Debnath\Desktop>c:\Python27\python.exe tmunro-ssl-test.py --server C:\Users\Shawn Debnath\Desktop>c:\Python27\python.exe tmunro-ssl-test.py --client Sending A... 2 Sending B... [Errno 10054] An existing connection was forcibly closed by the remote host Sending C... [Errno 10054] An existing connection was forcibly closed by the remote host Traceback (most recent call last): File "tmunro-ssl-test.py", line 57, in <module> client() File "tmunro-ssl-test.py", line 51, in client print s.recv(1024) socket.error: [Errno 10054] An existing connection was forcibly closed by the remote host -- Shawn Debnath Amazon Web Services (AWS)
On Wed, Mar 6, 2019 at 4:07 PM Shawn Debnath <sdn@amazon.com> wrote: > On Wed, Mar 06, 2019 at 11:13:31AM +1300, Thomas Munro wrote: > > So... can anyone tell us what happens on Windows? > C:\Users\Shawn Debnath\Desktop>c:\Python27\python.exe tmunro-ssl-test.py --client > Sending A... > 2 > Sending B... > [Errno 10054] An existing connection was forcibly closed by the remote host > Sending C... > [Errno 10054] An existing connection was forcibly closed by the remote host > Traceback (most recent call last): > File "tmunro-ssl-test.py", line 57, in <module> > client() > File "tmunro-ssl-test.py", line 51, in client > print s.recv(1024) > socket.error: [Errno 10054] An existing connection was forcibly closed by the remote host Thanks! Wow, so not only can we not read the final message sent by the server, if we try we get an error. That's... not what we want. I wonder if there might be a way to put the socket into don't-do-that mode... -- Thomas Munro https://enterprisedb.com
On Wed, Mar 6, 2019 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Bleugh. I think this OpenSSL package might just be buggy on ARM. On > > x86, apparently the same version of OpenSSL and all other details of > > the test the same, I can see that SSL_connect() returns <= 0 > > (failure), and then we ask for that cert revoked message directly and > > never even reach the startup packet sending code. On ARM, > > SSL_connect() returns 1 (success) and then we proceed as discussed and > > eventually get the error later (or not). So I think I should figure > > out a minimal repro and report this to them. > > Yeah, I've still been unable to reproduce even with the sleep idea, > so eelpout is definitely looking like a special snowflake from here. > In any case, there seems little doubt that getting past SSL_connect() > when the cert check has failed is an OpenSSL bug; I don't feel a need > to create a workaround for it. I was wrong: it breaks on an x86 system for me too (either with the sleep, or with clunky scheduling, I was running psql under strace when I saw it). Not sure what I did wrong last time I tried that. I opened a bug report with OpenSSL, let's see what they say: https://github.com/openssl/openssl/issues/8500 -- Thomas Munro https://enterprisedb.com
On Sun, Mar 17, 2019 at 2:00 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Mar 6, 2019 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Yeah, I've still been unable to reproduce even with the sleep idea, > > so eelpout is definitely looking like a special snowflake from here. > > In any case, there seems little doubt that getting past SSL_connect() > > when the cert check has failed is an OpenSSL bug; I don't feel a need > > to create a workaround for it. > > I was wrong: it breaks on an x86 system for me too (either with the > sleep, or with clunky scheduling, I was running psql under strace when > I saw it). Not sure what I did wrong last time I tried that. I > opened a bug report with OpenSSL, let's see what they say: > > https://github.com/openssl/openssl/issues/8500 This was an intentional change in TLS1.3, reducing round trips by verifying the client certificate later. I'm pretty sure the fix I mentioned earlier -- namely adding an ad-hoc call to pqHandleSendFailure() if we fail to send the start-up packet -- would fix eelpout's measles (though obviously wouldn't solve the problem for Windows given what we have learned about its TCP implementation). I should probably go and do that, unless you want to write the more general handling for send failure you described, and are prepared to back-patch it? -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Sun, Mar 17, 2019 at 2:00 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> I opened a bug report with OpenSSL, let's see what they say: >> https://github.com/openssl/openssl/issues/8500 > This was an intentional change in TLS1.3, reducing round trips by > verifying the client certificate later. Ugh. So probably we can reproduce it elsewhere if we use cutting-edge OpenSSL versions. > I'm pretty sure the fix I mentioned earlier -- namely adding an ad-hoc > call to pqHandleSendFailure() if we fail to send the start-up packet > -- would fix eelpout's measles (though obviously wouldn't solve the > problem for Windows given what we have learned about its TCP > implementation). I should probably go and do that, unless you want to > write the more general handling for send failure you described, and > are prepared to back-patch it? Well, I'm not sure about the back-patching aspect of that, but maybe I should write the patch and then we should see how risky it looks. Give me a few days ... regards, tom lane
I wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> This was an intentional change in TLS1.3, reducing round trips by >> verifying the client certificate later. > Ugh. So probably we can reproduce it elsewhere if we use cutting-edge > OpenSSL versions. I installed OpenSSL 1.1.1a on my Mac laptop. I got through 100 cycles of the ssl tests without a problem, which is not too surprising because longfin has been running on pretty much the exact same software stack since late November, and it has not shown the problem. However ... I threw in the sleep() where you advised in fe-connect.c, and kaboom! t/001_ssltests.pl .. 67/75 # Failed test 'certificate authorization fails with revoked client cert: matches' # at t/001_ssltests.pl line 375. # 'psql: server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. # could not send startup packet: Broken pipe # ' # doesn't match '(?^:SSL error)' t/001_ssltests.pl .. 74/75 # Failed test 'intermediate client certificate is missing: matches' # at t/001_ssltests.pl line 411. # 'psql: server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. # could not send startup packet: Broken pipe # ' # doesn't match '(?^:SSL error)' # Looks like you failed 2 tests of 75. t/001_ssltests.pl .. Dubious, test returned 2 (wstat 512, 0x200) Failed 2/75 subtests t/002_scram.pl ..... ok It seems quite repeatable this way. So that confirms that it's the OpenSSL version that is critical, and that you need a very new version to make it fail. I shall now see about fixing it... regards, tom lane
I wrote: > ... I don't like pqHandleSendFailure all that much: it has strong > constraints on what state libpq has to be in, as a consequence of which > it's called from a bunch of ad-hoc places, and can't be called from > some others. It's kind of accidental that it will work here. > I was toying with the idea of getting rid of that function in > favor of a design like this: > * Handle any socket-write failure at some low level of libpq by > recording the fact that the error occurred (plus whatever data we > need to produce a message about it), and then starting to just > discard output data. > * Eventually, we'll try to read. Process any available input data > as usual (and, if we get a read error, report that). When no more > input data is available, if a socket write failure has been recorded, > report that much as if it were an incoming ERROR message, and then > shut down the connection. > This would essentially give us pqHandleSendFailure-like behavior > across the board, which might be enough to fix this problem as well as > bug #15598. Or not ... as you say, we haven't thoroughly understood > either issue, so it's possible this wouldn't do it. Here's a draft patch along that line. It's gratifyingly small, and it does fix the SSL problem for me. I have no way of investigating whether it fixes bug #15598, but it might. Aside from the SSL cert-verify issue, I've checked the behavior when the backend is shut down ("pg_ctl stop") between queries, and when the backend is kill 9'd mid-query. The shutdown case reacts well with or without SSL: regression=# select 2+2; FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The backend-crash case is fine without SSL: regression=# select * from tenk1 t1, tenk1 t2, tenk1 t3; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. but a bit less nice with it: regression=# select * from tenk1 t1, tenk1 t2, tenk1 t3; SSL SYSCALL error: EOF detected The connection to the server was lost. Attempting reset: Succeeded. That seems cosmetic --- we just haven't expended the same effort on what to print for connection EOF with SSL as without it. (Also, you get that same result without this patch.) Of course, this hardly counts as exhaustive testing, especially since I only tried one OpenSSL version. > One thing that isn't real clear to me is how much timing sensitivity > there is in "when no more input data is available". Can we assume that > if we've gotten ECONNRESET or an allied error from a write, then any > data the far end might've wished to send us is already available to > read? In principle, since TCP allows shutting down either transmission > direction independently, the server could close the read side of its > socket while continuing to transmit data. In practice, PG servers > don't do that; but could network timing glitches create the same effect? > Even if it's possible, does it happen enough to worry about? > The reason I'm concerned is that I don't think it'd be bright to ignore a > send error until we see input EOF, which'd be the obvious way to solve a > timing problem if there is one. If our send error was some transient > glitch and the connection is really still open, then we might not get EOF, > but we won't get a server reply either because no message went to the > server. You could imagine waiting some small interval of time before > deciding that it's time to report the write failure, but ugh. As written, this patch does seem vulnerable to this concern, if it's real. We could address it for post-connection failures by tweaking PQgetResult so that it doesn't call pqWait if there's already a write error recorded; but I'm far from sure that that would be a net improvement. Also, making a comparable change to the behavior of the PQconnectPoll state machine might be tricky. My current feeling is that this is OK to put in HEAD but I think the risk-reward ratio isn't very good for the back branches. Even with an OpenSSL version where this makes a difference, the problematic behavior is pretty hard to hit. So I'm a bit inclined to do nothing in the back branches. regards, tom lane diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index c96a52b..e3bf6a7 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -537,6 +537,10 @@ pqDropServerData(PGconn *conn) conn->last_sqlstate[0] = '\0'; conn->auth_req_received = false; conn->password_needed = false; + conn->write_failed = false; + if (conn->write_err_msg) + free(conn->write_err_msg); + conn->write_err_msg = NULL; conn->be_pid = 0; conn->be_key = 0; } @@ -3702,6 +3706,8 @@ freePGconn(PGconn *conn) /* Note that conn->Pfdebug is not ours to close or free */ if (conn->last_query) free(conn->last_query); + if (conn->write_err_msg) + free(conn->write_err_msg); if (conn->inBuffer) free(conn->inBuffer); if (conn->outBuffer) diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index ac969e7..93ff428 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -791,6 +791,32 @@ pqSaveErrorResult(PGconn *conn) } /* + * As above, and append conn->write_err_msg to whatever other error we have. + * This is used when we've detected a write failure and have exhausted our + * chances of reporting something else instead. + */ +static void +pqSaveWriteError(PGconn *conn) +{ + /* + * Ensure conn->result is an error result, and add anything in + * conn->errorMessage to it. + */ + pqSaveErrorResult(conn); + + /* + * Now append write_err_msg to that. If it's null because of previous + * strdup failure, do what we can. (It's likely our machinations here are + * all getting OOM failures as well, but ...) + */ + if (conn->write_err_msg) + pqCatenateResultError(conn->result, conn->write_err_msg); + else + pqCatenateResultError(conn->result, + libpq_gettext("write to server failed\n")); +} + +/* * This subroutine prepares an async result object for return to the caller. * If there is not already an async result object, build an error object * using whatever is in conn->errorMessage. In any case, clear the async @@ -1224,7 +1250,7 @@ PQsendQuery(PGconn *conn, const char *query) pqPuts(query, conn) < 0 || pqPutMsgEnd(conn) < 0) { - pqHandleSendFailure(conn); + /* error message should be set up already */ return 0; } @@ -1243,7 +1269,7 @@ PQsendQuery(PGconn *conn, const char *query) */ if (pqFlush(conn) < 0) { - pqHandleSendFailure(conn); + /* error message should be set up already */ return 0; } @@ -1389,7 +1415,7 @@ PQsendPrepare(PGconn *conn, return 1; sendFailed: - pqHandleSendFailure(conn); + /* error message should be set up already */ return 0; } @@ -1641,40 +1667,11 @@ PQsendQueryGuts(PGconn *conn, return 1; sendFailed: - pqHandleSendFailure(conn); + /* error message should be set up already */ return 0; } /* - * pqHandleSendFailure: try to clean up after failure to send command. - * - * Primarily, what we want to accomplish here is to process any ERROR or - * NOTICE messages that the backend might have sent just before it died. - * Since we're in IDLE state, all such messages will get sent to the notice - * processor. - * - * NOTE: this routine should only be called in PGASYNC_IDLE state. - */ -void -pqHandleSendFailure(PGconn *conn) -{ - /* - * Accept and parse any available input data, ignoring I/O errors. Note - * that if pqReadData decides the backend has closed the channel, it will - * close our side of the socket --- that's just what we want here. - */ - while (pqReadData(conn) > 0) - parseInput(conn); - - /* - * Be sure to parse available input messages even if we read no data. - * (Note: calling parseInput within the above loop isn't really necessary, - * but it prevents buffer bloat if there's a lot of data available.) - */ - parseInput(conn); -} - -/* * Select row-by-row processing mode */ int @@ -1763,8 +1760,11 @@ PQisBusy(PGconn *conn) /* Parse any available data, if our state permits. */ parseInput(conn); - /* PQgetResult will return immediately in all states except BUSY. */ - return conn->asyncStatus == PGASYNC_BUSY; + /* + * PQgetResult will return immediately in all states except BUSY, or if we + * had a write failure. + */ + return conn->asyncStatus == PGASYNC_BUSY || conn->write_failed; } @@ -1804,7 +1804,13 @@ PQgetResult(PGconn *conn) } } - /* Wait for some more data, and load it. */ + /* + * Wait for some more data, and load it. (Note: if the connection has + * been lost, pqWait should return immediately because the socket + * should be read-ready, either with the last server data or with an + * EOF indication. We expect therefore that this won't result in any + * undue delay in reporting a previous write failure.) + */ if (flushResult || pqWait(true, false, conn) || pqReadData(conn) < 0) @@ -1820,6 +1826,17 @@ PQgetResult(PGconn *conn) /* Parse it. */ parseInput(conn); + + /* + * If we had a write error, but nothing above obtained a query result + * or detected a read error, report the write error. + */ + if (conn->write_failed && conn->asyncStatus == PGASYNC_BUSY) + { + pqSaveWriteError(conn); + conn->asyncStatus = PGASYNC_IDLE; + return pqPrepareAsyncResult(conn); + } } /* Return the appropriate thing. */ @@ -2252,7 +2269,7 @@ PQsendDescribe(PGconn *conn, char desc_type, const char *desc_target) return 1; sendFailed: - pqHandleSendFailure(conn); + /* error message should be set up already */ return 0; } diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index e5ef8d4..718da1c 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -824,6 +824,13 @@ definitelyFailed: * * Return 0 on success, -1 on failure and 1 when not all data could be sent * because the socket would block and the connection is non-blocking. + * + * Upon write failure, conn->write_failed is set and the error message is + * saved in conn->write_err_msg, but we clear the output buffer and return + * zero anyway; this is because callers should soldier on until it's possible + * to read from the server and check for an error message. write_err_msg + * should be reported only when we are unable to obtain a server error first. + * (Thus, a -1 result is returned only for an internal *read* failure.) */ static int pqSendSome(PGconn *conn, int len) @@ -832,13 +839,32 @@ pqSendSome(PGconn *conn, int len) int remaining = conn->outCount; int result = 0; + /* + * If we already had a write failure, we will never again try to send data + * on that connection. Even if the kernel would let us, we've probably + * lost message boundary sync with the server. conn->write_failed + * therefore persists until the connection is reset, and we just discard + * all data presented to be written. + */ + if (conn->write_failed) + { + /* conn->write_err_msg should be set up already */ + conn->outCount = 0; + return 0; + } + if (conn->sock == PGINVALID_SOCKET) { printfPQExpBuffer(&conn->errorMessage, libpq_gettext("connection not open\n")); + conn->write_failed = true; + /* Transfer error message to conn->write_err_msg, if possible */ + /* (strdup failure is OK, we'll cope later) */ + conn->write_err_msg = strdup(conn->errorMessage.data); + resetPQExpBuffer(&conn->errorMessage); /* Discard queued data; no chance it'll ever be sent */ conn->outCount = 0; - return -1; + return 0; } /* while there's still data to send */ @@ -876,17 +902,18 @@ pqSendSome(PGconn *conn, int len) default: /* pqsecure_write set the error message for us */ + conn->write_failed = true; /* - * We used to close the socket here, but that's a bad idea - * since there might be unread data waiting (typically, a - * NOTICE message from the backend telling us it's - * committing hara-kiri...). Leave the socket open until - * pqReadData finds no more data can be read. But abandon - * attempt to send data. + * Transfer error message to conn->write_err_msg, if + * possible (strdup failure is OK, we'll cope later) */ + conn->write_err_msg = strdup(conn->errorMessage.data); + resetPQExpBuffer(&conn->errorMessage); + + /* Discard queued data; no chance it'll ever be sent */ conn->outCount = 0; - return -1; + return 0; } } else @@ -921,6 +948,9 @@ pqSendSome(PGconn *conn, int len) * can do, and works pretty well in practice. (The documentation * used to say that you only need to wait for write-ready, so * there are still plenty of applications like that out there.) + * + * Note that errors here don't result in write_failed becoming + * set. */ if (pqReadData(conn) < 0) { @@ -956,6 +986,7 @@ pqSendSome(PGconn *conn, int len) * * Return 0 on success, -1 on failure and 1 when not all data could be sent * because the socket would block and the connection is non-blocking. + * (See pqSendSome comments about how failure should be handled.) */ int pqFlush(PGconn *conn) diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c index 1ab1421..0e36974 100644 --- a/src/interfaces/libpq/fe-protocol2.c +++ b/src/interfaces/libpq/fe-protocol2.c @@ -1450,42 +1450,30 @@ pqFunctionCall2(PGconn *conn, Oid fnid, pqPutInt(fnid, 4, conn) != 0 || /* function id */ pqPutInt(nargs, 4, conn) != 0) /* # of args */ { - pqHandleSendFailure(conn); + /* error message should be set up already */ return NULL; } for (i = 0; i < nargs; ++i) { /* len.int4 + contents */ if (pqPutInt(args[i].len, 4, conn)) - { - pqHandleSendFailure(conn); return NULL; - } if (args[i].isint) { if (pqPutInt(args[i].u.integer, 4, conn)) - { - pqHandleSendFailure(conn); return NULL; - } } else { if (pqPutnchar((char *) args[i].u.ptr, args[i].len, conn)) - { - pqHandleSendFailure(conn); return NULL; - } } } if (pqPutMsgEnd(conn) < 0 || pqFlush(conn)) - { - pqHandleSendFailure(conn); return NULL; - } for (;;) { diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c index 47dbc31..ec51fee 100644 --- a/src/interfaces/libpq/fe-protocol3.c +++ b/src/interfaces/libpq/fe-protocol3.c @@ -1926,50 +1926,35 @@ pqFunctionCall3(PGconn *conn, Oid fnid, pqPutInt(1, 2, conn) < 0 || /* format code: BINARY */ pqPutInt(nargs, 2, conn) < 0) /* # of args */ { - pqHandleSendFailure(conn); + /* error message should be set up already */ return NULL; } for (i = 0; i < nargs; ++i) { /* len.int4 + contents */ if (pqPutInt(args[i].len, 4, conn)) - { - pqHandleSendFailure(conn); return NULL; - } if (args[i].len == -1) continue; /* it's NULL */ if (args[i].isint) { if (pqPutInt(args[i].u.integer, args[i].len, conn)) - { - pqHandleSendFailure(conn); return NULL; - } } else { if (pqPutnchar((char *) args[i].u.ptr, args[i].len, conn)) - { - pqHandleSendFailure(conn); return NULL; - } } } if (pqPutInt(1, 2, conn) < 0) /* result format code: BINARY */ - { - pqHandleSendFailure(conn); return NULL; - } if (pqPutMsgEnd(conn) < 0 || pqFlush(conn)) - { - pqHandleSendFailure(conn); return NULL; - } for (;;) { diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 4a93d8e..dbe0f7e 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -410,6 +410,8 @@ struct pg_conn bool password_needed; /* true if server demanded a password */ bool sigpipe_so; /* have we masked SIGPIPE via SO_NOSIGPIPE? */ bool sigpipe_flag; /* can we mask SIGPIPE via MSG_NOSIGNAL? */ + bool write_failed; /* have we had a write failure on sock? */ + char *write_err_msg; /* write error message, or NULL if OOM */ /* Transient state needed while establishing connection */ bool try_next_addr; /* time to advance to next address/host? */ @@ -585,7 +587,6 @@ extern void pqSaveMessageField(PGresult *res, char code, extern void pqSaveParameterStatus(PGconn *conn, const char *name, const char *value); extern int pqRowProcessor(PGconn *conn, const char **errmsgp); -extern void pqHandleSendFailure(PGconn *conn); /* === in fe-protocol2.c === */
On Tue, Mar 19, 2019 at 9:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > ... I don't like pqHandleSendFailure all that much: it has strong > > constraints on what state libpq has to be in, as a consequence of which > > it's called from a bunch of ad-hoc places, and can't be called from > > some others. It's kind of accidental that it will work here. > > I was toying with the idea of getting rid of that function in > > favor of a design like this: > > * Handle any socket-write failure at some low level of libpq by > > recording the fact that the error occurred (plus whatever data we > > need to produce a message about it), and then starting to just > > discard output data. > > * Eventually, we'll try to read. Process any available input data > > as usual (and, if we get a read error, report that). When no more > > input data is available, if a socket write failure has been recorded, > > report that much as if it were an incoming ERROR message, and then > > shut down the connection. > > This would essentially give us pqHandleSendFailure-like behavior > > across the board, which might be enough to fix this problem as well as > > bug #15598. Or not ... as you say, we haven't thoroughly understood > > either issue, so it's possible this wouldn't do it. > > Here's a draft patch along that line. It's gratifyingly small, and > it does fix the SSL problem for me. I have no way of investigating > whether it fixes bug #15598, but it might. > > Aside from the SSL cert-verify issue, I've checked the behavior when > the backend is shut down ("pg_ctl stop") between queries, and when > the backend is kill 9'd mid-query. The shutdown case reacts well with > or without SSL: > > regression=# select 2+2; > FATAL: terminating connection due to administrator command > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > The backend-crash case is fine without SSL: > > regression=# select * from tenk1 t1, tenk1 t2, tenk1 t3; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Succeeded. > > but a bit less nice with it: > > regression=# select * from tenk1 t1, tenk1 t2, tenk1 t3; > SSL SYSCALL error: EOF detected > The connection to the server was lost. Attempting reset: Succeeded. > > That seems cosmetic --- we just haven't expended the same effort on > what to print for connection EOF with SSL as without it. (Also, > you get that same result without this patch.) > > Of course, this hardly counts as exhaustive testing, especially since > I only tried one OpenSSL version. Neat. Fixes the problem for me on eelpout's host. > > One thing that isn't real clear to me is how much timing sensitivity > > there is in "when no more input data is available". Can we assume that > > if we've gotten ECONNRESET or an allied error from a write, then any > > data the far end might've wished to send us is already available to > > read? Following a trail beginning at https://en.wikipedia.org/wiki/Transmission_Control_Protocol I see that RFC 1122 4.2.2.13 discusses this topic and possible variations in this area. I don't know enough about any of this stuff to draw hard conclusions from primary sources, but it does appear that an implementation might be within its rights to jettison that data, unfortunately. As seen with the simple Python script experiment up-thread, we tested four implementations, and (amusingly) got four different behaviours: 1. For Windows, the answer to your question was clearly "no" in that experiment. I think this behaviour sucks for any protocol that involves unsolicited messages in both directions (that is, not based on pure request/response processing), because there is no way to guarantee that you receive a "goodbye" message, and that's what's happening in bug #15598. I'm not sure if there is anything we ever can do about this though, other than complaining on the internet or opening a second TCP connection for unsolicited server->client messages. 2. Linux, FreeBSD and Darwin gave slightly different error sequences when writing after the remote connection was closed (though I suspect they'd behave much the same way for a connection to a remote host), but all allowed the "goodbye" message to be read, so the answer there is "yes". > > In principle, since TCP allows shutting down either transmission > > direction independently, the server could close the read side of its > > socket while continuing to transmit data. In practice, PG servers > > don't do that; but could network timing glitches create the same effect? > > Even if it's possible, does it happen enough to worry about? > > The reason I'm concerned is that I don't think it'd be bright to ignore a > > send error until we see input EOF, which'd be the obvious way to solve a > > timing problem if there is one. If our send error was some transient > > glitch and the connection is really still open, then we might not get EOF, > > but we won't get a server reply either because no message went to the > > server. You could imagine waiting some small interval of time before > > deciding that it's time to report the write failure, but ugh. > > As written, this patch does seem vulnerable to this concern, if it's real. > We could address it for post-connection failures by tweaking PQgetResult > so that it doesn't call pqWait if there's already a write error recorded; > but I'm far from sure that that would be a net improvement. Also, making > a comparable change to the behavior of the PQconnectPoll state machine > might be tricky. I think ignoring send errors until you eventually get a recv error or an EOF is OK, because that's roughly equivalent to the way sending works in TCP in general anyway; you find out that the other end has gone away a bit later. That is the reason you can't reproduce this problem on a FreeBSD box with OpenSSL 1.1.1 even if you add the sleep, or (presumably, though I didn't test) on a Linux system with remote server: the send() succeeds, because it doesn't or can't peek at the other end's TCP state and see that it's been closed already, instead it merely transmits the message and the other send later sends back a RST bit that'll cause some future send() syscall to fail. I can't imagine what sort of transient glitch could apply here. I guess the question is: is there a way for send() to report an error, but recv() to report EAGAIN so we finish up blocked? I can't immediately see how that could happen. > My current feeling is that this is OK to put in HEAD but I think the > risk-reward ratio isn't very good for the back branches. Even with > an OpenSSL version where this makes a difference, the problematic > behavior is pretty hard to hit. So I'm a bit inclined to do nothing > in the back branches. Shouldn't we also back-patch the one-line change adding pqHandleSendFailure()? Then eelpout would be green for REL_11_STABLE, and any future buildfarm animals (probably underpowered ones with wonky scheduling) when they are eventually upgraded to OpenSSL 1.1.1. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Mar 19, 2019 at 9:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My current feeling is that this is OK to put in HEAD but I think the >> risk-reward ratio isn't very good for the back branches. Even with >> an OpenSSL version where this makes a difference, the problematic >> behavior is pretty hard to hit. So I'm a bit inclined to do nothing >> in the back branches. > Shouldn't we also back-patch the one-line change adding > pqHandleSendFailure()? As I said before, I don't like that patch: at best it's an abuse of pqHandleSendFailure, because that function is only meant to be called at start of a query cycle. It wouldn't be hard to break this usage and not notice, especially given that we often don't test back-patched changes very carefully in the back branches if they seem OK in HEAD. Possibly we could consider back-patching the more aggressive patch once it's survived v12 beta testing, and just living with the issue till then. Given what we know now, I don't think this is a big problem for the field: how many people use SSL on local connections? regards, tom lane
On Tue, Mar 19, 2019 at 12:25 PM Thomas Munro <thomas.munro@gmail.com> wrote: > 2. Linux, FreeBSD and Darwin gave slightly different error sequences > when writing after the remote connection was closed (though I suspect > they'd behave much the same way for a connection to a remote host), > but all allowed the "goodbye" message to be read, so the answer there > is "yes". Qualification, after some off-list discussion with Andrew Gierth: even though that appears to work in happy tests on those TCP implementations, it may be that they can't really guarantee that it works in adverse conditions, because the server may not be able to retransmit packets that preceded the reset but somehow went missing, and it's also not clear what happens if they arrive out of order. Dunno. (Oh man, all this hidden magic, is it actually possible to write working application code without imbibing Stevens TCP/IP Illustrated? It's a bit like last year's work on filesystem quirks, when I realised that man pages are nice and all, but you basically need to read the source or talk to someone who has...) -- Thomas Munro https://enterprisedb.com
On Tue, Mar 19, 2019 at 12:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Shouldn't we also back-patch the one-line change adding > > pqHandleSendFailure()? > > As I said before, I don't like that patch: at best it's an abuse of > pqHandleSendFailure, because that function is only meant to be called > at start of a query cycle. It wouldn't be hard to break this usage and > not notice, especially given that we often don't test back-patched > changes very carefully in the back branches if they seem OK in HEAD. > > Possibly we could consider back-patching the more aggressive patch > once it's survived v12 beta testing, and just living with the issue > till then. Given what we know now, I don't think this is a big > problem for the field: how many people use SSL on local connections? Yeah, now that we understand this properly I agree this is unlikely to bother anyone in real life. I just want to make the build farm green. I wondered about ssl_max_protocol_version = 'TLSv1.2', but that GUC's too new. Another option would be to change the "command_fails_like" pattern to tolerate both errors in v11. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > Yeah, now that we understand this properly I agree this is unlikely to > bother anyone in real life. I just want to make the build farm green. > I wondered about ssl_max_protocol_version = 'TLSv1.2', but that GUC's > too new. Yeah; also, forcing that would reduce our test coverage. > Another option would be to change the "command_fails_like" > pattern to tolerate both errors in v11. This might be a reasonable stopgap measure, since it'd basically only affect buildfarm results. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Mar 19, 2019 at 9:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> One thing that isn't real clear to me is how much timing sensitivity >>> there is in "when no more input data is available". Can we assume that >>> if we've gotten ECONNRESET or an allied error from a write, then any >>> data the far end might've wished to send us is already available to >>> read? > Following a trail beginning at > https://en.wikipedia.org/wiki/Transmission_Control_Protocol I see that > RFC 1122 4.2.2.13 discusses this topic and possible variations in this > area. I don't know enough about any of this stuff to draw hard > conclusions from primary sources, but it does appear that an > implementation might be within its rights to jettison that data, > unfortunately. I spent some time looking at that, and as far as I can see, the behavior reported for Windows is flat out forbidden by TCP. RFC1122 does discuss the possibility that an O/S might not support half-closed connections; but that would only matter if (in Unix terms) we issued shutdown(sock, SHUT_WR) and expected to continue to be able to read, which we do not. In any other scenario, TCP is supposed to deliver any data it successfully received. We can't, of course, work magic and retrieve data the TCP stack never got, nor is there much we can do about it if the stack throws away data despite the spec. So I'm not inclined to fuss about the corner cases. The main thing for us is to ensure that if a server error message is available to read, we do so and return it rather than returning a less-helpful bleat about being unable to write. The proposed patch does that. It will also tend to report bleats about read failure rather than write failure, even if from libpq's perspective the write failure happened first; but that seems fine to me. >>> The reason I'm concerned is that I don't think it'd be bright to ignore a >>> send error until we see input EOF, which'd be the obvious way to solve a >>> timing problem if there is one. If our send error was some transient >>> glitch and the connection is really still open, then we might not get EOF, >>> but we won't get a server reply either because no message went to the >>> server. You could imagine waiting some small interval of time before >>> deciding that it's time to report the write failure, but ugh. I'm likewise inclined to dismiss this worry, because I don't see how it could happen, given that the server doesn't use shutdown(2). A send error should imply either that the kernel saw RST from the remote end, or that the connection is local and the kernel knows that the server closed its socket or crashed. In either scenario the kernel should consider that the incoming data stream is ended; maybe it'll give us whatever data it received or maybe not, but it shouldn't allow us to sit and wait for more data. Or in other words: there are no client-visible "transient glitches" in TCP. Either the connection is still up, or it's been reset. So I'm inclined to (1) commit the patch as-proposed in HEAD, and (2) hack the ssl test cases in v11 as you suggested. If we see field complaints about this, we can consider reverting (2) in favor of a back-patch once v12 beta is over. regards, tom lane
On Wed, Mar 20, 2019 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So I'm inclined to (1) commit the patch as-proposed in HEAD, and > (2) hack the ssl test cases in v11 as you suggested. If we see field > complaints about this, we can consider reverting (2) in favor of > a back-patch once v12 beta is over. This plan looks good to me. -- Thomas Munro https://enterprisedb.com