Обсуждение: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)
Hi folks, with 12.1, after a couple of queries, at a random place, the clientlib does produce a failed query without giving reason or error-message [1]. Then when retrying, the clientlib switches off signal handling and sits inactive in memory (needs kill -9). The server log shows no error or other hint. The behaviour happens rarely with trust access, and almost always when using Kerberos5 (Heimdal as included in FreeBSD). 11.5 clientlib has none of this behaviour and seems to work fine, like 10.10 did. Environment: OS FreeBSD 11.3 Applic. Ruby-on-Rails, ruby=2.5.7, gem 'pg'=1.2.2 (it makes no difference if that one is compiled with the 12.1 or the 10.10 library) Server 12.1 [1] the message from ruby is PG::ConnectionBad: PQconsumeInput() : <query> rgds, PMc
On 1/9/20 10:18 AM, Peter wrote: > Hi folks, > > with 12.1, after a couple of queries, at a random place, the clientlib > does produce a failed query without giving reason or error-message [1]. > Then when retrying, the clientlib switches off signal handling and > sits inactive in memory (needs kill -9). > > The server log shows no error or other hint. > The behaviour happens rarely with trust access, and almost always when > using Kerberos5 (Heimdal as included in FreeBSD). > > 11.5 clientlib has none of this behaviour and seems to work fine, like > 10.10 did. Might want to take at below: https://github.com/ged/ruby-pg/issues/311 > > Environment: > OS FreeBSD 11.3 > Applic. Ruby-on-Rails, ruby=2.5.7, gem 'pg'=1.2.2 > (it makes no difference if that one is compiled with > the 12.1 or the 10.10 library) > Server 12.1 > > [1] the message from ruby is > PG::ConnectionBad: PQconsumeInput() : <query> > > rgds, > PMc > > -- Adrian Klaver adrian.klaver@aklaver.com
Peter <pmc@citylink.dinoex.sub.org> writes: > with 12.1, after a couple of queries, at a random place, the clientlib > does produce a failed query without giving reason or error-message [1]. > Then when retrying, the clientlib switches off signal handling and > sits inactive in memory (needs kill -9). Seems like you'd better raise this with the author(s) of the "pg" Ruby gem. Perhaps they read this mailing list, but more likely they have a specific bug reporting mechanism somewhere. regards, tom lane
On Thu, Jan 09, 2020 at 10:47:00AM -0800, Adrian Klaver wrote: ! ! Might want to take at below: ! ! https://github.com/ged/ruby-pg/issues/311 Thanks a lot! That option > gssencmode: "disable" seems to solve the issue. But I think the people there are concerned by a different issue: they are bothering about fork(), while my flaw appears also when I do *NOT* do fork. Also the picture is slightly different; they get segfaults, I get misbehaviour. rgds, PMc
On Thu, Jan 09, 2020 at 01:48:01PM -0500, Tom Lane wrote: ! Peter <pmc@citylink.dinoex.sub.org> writes: ! > with 12.1, after a couple of queries, at a random place, the clientlib ! > does produce a failed query without giving reason or error-message [1]. ! > Then when retrying, the clientlib switches off signal handling and ! > sits inactive in memory (needs kill -9). ! ! Seems like you'd better raise this with the author(s) of the "pg" ! Ruby gem. Perhaps they read this mailing list, but more likely ! they have a specific bug reporting mechanism somewhere. Tom, I don't think this has anything to do with "pg". Just checked: I get garbage and misbehaviour on the "psql" command line tool also: $ psql -h myhost flowmdev psql (12.1) GSSAPI-encrypted connection Type "help" for help. flowmdev=> select * from flows; message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle flowmdev=> select * from flows; server sent data ("D" message) without prior row description ("T" message) flowmdev=> select * from flows; message type 0x54 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle id | name | ... <here finally starts the data as expected> To the contrary: $ PGGSSENCMODE="disable" psql -h myhost flowmdev psql (12.1) Type "help" for help. flowmdev=> select * from flows; id | name | ... <all working as normal> rgds, PMc
Peter <pmc@citylink.dinoex.sub.org> writes: > I don't think this has anything to do with "pg". Just checked: I get > garbage and misbehaviour on the "psql" command line tool also: > $ psql -h myhost flowmdev > psql (12.1) > GSSAPI-encrypted connection > Type "help" for help. > flowmdev=> select * from flows; > message type 0x44 arrived from server while idle > message type 0x44 arrived from server while idle > message type 0x44 arrived from server while idle Oh ... that does look pretty broken. However, we've had no other similar reports, so there must be something unique to your configuration. Busted GSSAPI library, or some ABI inconsistency, perhaps? What platform are you on, and how did you build or obtain this Postgres code? regards, tom lane
On Thu, Jan 09, 2020 at 04:31:44PM -0500, Tom Lane wrote: ! Peter <pmc@citylink.dinoex.sub.org> writes: ! > flowmdev=> select * from flows; ! > message type 0x44 arrived from server while idle ! > message type 0x44 arrived from server while idle ! > message type 0x44 arrived from server while idle ! ! Oh ... that does look pretty broken. However, we've had no other similar ! reports, so there must be something unique to your configuration. Busted ! GSSAPI library, or some ABI inconsistency, perhaps? What platform are you ! on, and how did you build or obtain this Postgres code? This is a FreeBSD 11.3-p3 r351611 built from source. Postgres is built from https://svn0.eu.freebsd.org/ports/branches/2019Q4 (rel. 12r1) or https://svn0.eu.freebsd.org/ports/branches/2020Q1 (rel. 12.1) with "make package install". I have a build environment for base&ports that forces recompiles on any change and should make ABI inconsistencies quite hard to create. All local patches are versioned and documented; there are none that I could imagine influencing this. There are no patches on postgres. Also no patches on the GSSAPI. There are a couple of patches on the Heimdal, to fix broken commandline parsing, broken pidfile handling and broken daemonization. None of them touches the core functionality (like key handling). But I just recognize something of interest (which I had taken for granted when importing the database): the flaw does NOT appear when accessing the database from the server's local system (with TCP and GSSAPI encryption active). Only from remote system. But then, if I go on the local system, and change the mtu: # ifconfig lo0 mtu 1500 and restart the server, then I get the exact same errors locally. I don't get a clue of that, it doesn't make sense. With the default lo0 mtu of 16384 the packets go on the network with the full 8256 bytes you send. With mtu 1500 they are split into 1448 byte pieces; but TCP is supposed to handle this transparently. And what difference would the encryption make with this? > net.inet.tcp.sendspace: 32768 > net.inet.tcp.recvspace: 65536 These are also bigger. No, I don't understand that. The only thing - these are all VIMAGE jails. VIMAGE was considered 'experimental' some time ago, and went productive in FreeBSD 12.0, and 11.3 is lower and later than 12.0 - whatever that concedes. Another thing I found out: the slower the network, the worse the errors. So might it be nobody complained just because those people usually having GSSAPI also have very fast machines and networks nowadays? When I go to packet-radio speed: # ipfw pipe 4 config bw 10kbit/s then I can see the query returning empty at the first received bytes: flowmdev=# select * from flows; flowmdev=# and not even waiting the 8 seconds for the first block to arrive. rgds, PMc