Обсуждение: BUG #10141: Server fails to send query result.
The following bug has been logged on the website: Bug reference: 10141 Logged by: Evgen Bodunov Email address: molind@gmail.com PostgreSQL version: 9.3.4 Operating system: Ubuntu 13.10 Description: I have big database with geospatial data from OpenStreetMap. I prepare vector tiles from this data using console app which requests data from database for each tile in given area. While it works it opens 1 db connection per thread (8 threads) and queries all tiles from list 25k tiles in average. 5 queries per tile. During map generation usually 2 threads stuck in idle state. http://i.stack.imgur.com/VUU4O.png Look at the last row. Query started. We have a timestamp, then it finished and changed state to idle, and we have timestamp of that moment. Everything looks ok, except console app which waits answer from server inside PGresult *res = PQexec(conn, query); It waits for query results from server. And I can't find why server thinks that job is done and app says that there was no answer from db yet. I added PQtrace for each thread and wait for stuck thread. There is trace: >From backend> D >From backend (#4)> 139891 >From backend> D >From backend (#4)> 139891 >From backend> D >From backend (#4)> 139891 >From backend> D >From backend (#4)> 139891 >From backend> D >From backend (#4)> 139891 >From backend> D >From backend (#4)> 139891 Seems problem somewhere inside PostgreSQL. It tries to send result but fails. PostgreSQL 9.3.4-1.pgdg70+1 installed from apt source deb http://apt.postgresql.org/pub/repos/apt/ wheezy-pgdg main
Greetings, * molind@gmail.com (molind@gmail.com) wrote: > While it works it opens 1 db connection per thread (8 threads) and queries > all tiles from list 25k tiles in average. 5 queries per tile. During map > generation usually 2 threads stuck in idle state. >=20 > http://i.stack.imgur.com/VUU4O.png This only shows what's happening server-side. > Look at the last row. Query started. We have a timestamp, then it finished > and changed state to idle, and we have timestamp of that moment. Everythi= ng > looks ok, except console app which waits answer from server inside >=20 > PGresult *res =3D PQexec(conn, query); >=20 > It waits for query results from server. And I can't find why server thinks > that job is done and app says that there was no answer from db yet. Can you provide a small example test case which illustrates this? You didn't provide your code, so it's really difficult to have any idea what's going on. What version of libpq is being used and on what platform? Are you using SSL at all for your connection? > I added PQtrace for each thread and wait for stuck thread. There is trace: >=20 > >From backend> D > >From backend (#4)> 139891 > >From backend> D > >From backend (#4)> 139891 > >From backend> D > >From backend (#4)> 139891 > >From backend> D > >From backend (#4)> 139891 > >From backend> D > >From backend (#4)> 139891 > >From backend> D > >From backend (#4)> 139891 >=20 > Seems problem somewhere inside PostgreSQL. It tries to send result but > fails. I don't see anything here which shows that to be the case. Is the TCP connection still fully ESTABLISHED on both sides? What's in the middle..? Could you have a firewall or IDS which is dropping the connections? Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * molind@gmail.com (molind@gmail.com) wrote: >> I added PQtrace for each thread and wait for stuck thread. There is trace: >> From backend> D >> From backend (#4)> 139891 >> From backend> D >> From backend (#4)> 139891 >> From backend> D >> From backend (#4)> 139891 >> From backend> D >> From backend (#4)> 139891 >> From backend> D >> From backend (#4)> 139891 >> From backend> D >> From backend (#4)> 139891 FWIW, this looks suspiciously like libpq is unable to consume data from its input buffer (and keeps retrying to process the same data). Are you using PQsetnonblocking by any chance? If so, this probably indicates failure to follow the required call sequencing to process data. Another likely theory, given that you mentioned multiple client threads, is that the threads are stomping on each others' toes somehow. libpq does not defend itself against that: it's up to you to be sure that only one thread is touching each PGconn object. >> Seems problem somewhere inside PostgreSQL. It tries to send result but >> fails. > I don't see anything here which shows that to be the case. Indeed. I'd bet considerable money that this is a client-side issue. It's possible that it's a libpq bug, but what seems more likely is that you're using libpq improperly. regards, tom lane
There is more details: Even when it works it sends D more than once usually =46rom backend> D =46rom backend (#4)> 33170 =46rom backend> D =46rom backend (#4)> 33170 =46rom backend> D =46rom backend (#4)> 33170 =46rom backend> D =46rom backend (#4)> 33170 =46rom backend> D =46rom backend (#4)> 33170 =46rom backend (#2)> 2 =46rom backend (#4)> 33132 =46rom backend (33132)> = \x01060000006c0000000103000000010000000a00000080ce26eb13eaa640003fad186435= 9840800c2b00bdeea6400055d =46rom backend (#4)> 24 =46rom backend (24)> "landuse"=3D>"residential" My query looks like select * from poly9(3130860.6785608195, 7592337.1455099881, = 3209132.1955248402, 7670608.6624740083); while poly9 is function generated by app before data processing starts. = And basically it looks like: CREATE FUNCTION "public"."poly9"(IN minx numeric, IN miny numeric, IN = maxx numeric, IN maxy numeric)=20 returns table (geom_out bytea, tags_out public.hstore)=20 AS $BODY$ declare bbox =3D st_makebox2d(st_makepoint(minx, miny), st_makepoint(maxx, = maxy)); begin -- simplified version return query SELECT way, tags FROM planet_polygon WHERE way && bbox;=20 end; $BODY$ LANGUAGE plpgsql COST 100 ROWS 1000 CALLED ON NULL INPUT SECURITY INVOKER VOLATILE; Hope it helps. This solution worked fine 2 months ago. Is there a small = chance that something broken inside 9.3.4? Best Regards, Evgen Bodunov. On Apr 25, 2014, at 10:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * molind@gmail.com (molind@gmail.com) wrote: >>> I added PQtrace for each thread and wait for stuck thread. There is = trace: >=20 >>> =46rom backend> D >>> =46rom backend (#4)> 139891 >>> =46rom backend> D >>> =46rom backend (#4)> 139891 >>> =46rom backend> D >>> =46rom backend (#4)> 139891 >>> =46rom backend> D >>> =46rom backend (#4)> 139891 >>> =46rom backend> D >>> =46rom backend (#4)> 139891 >>> =46rom backend> D >>> =46rom backend (#4)> 139891 >=20 > FWIW, this looks suspiciously like libpq is unable to consume data = from > its input buffer (and keeps retrying to process the same data). Are = you > using PQsetnonblocking by any chance? If so, this probably indicates > failure to follow the required call sequencing to process data. >=20 > Another likely theory, given that you mentioned multiple client = threads, > is that the threads are stomping on each others' toes somehow. libpq > does not defend itself against that: it's up to you to be sure that > only one thread is touching each PGconn object. >=20 >>> Seems problem somewhere inside PostgreSQL. It tries to send result = but >>> fails. >=20 >> I don't see anything here which shows that to be the case. >=20 > Indeed. I'd bet considerable money that this is a client-side issue. > It's possible that it's a libpq bug, but what seems more likely is > that you're using libpq improperly. >=20 > regards, tom lane
Hello guys,=20 I checked it with direct ethernet connection and now it works without = issues. When i switched OpenVPN server from "proto udp=94 to =93proto = tcp" mode it seems also works ok. Thanks for your help. Best Regards, Evgen Bodunov. On Apr 25, 2014, at 11:55 PM, Evgen Bodunov <molind@gmail.com> wrote: > There is more details: >=20 > Even when it works it sends D more than once usually >=20 > =46rom backend> D > =46rom backend (#4)> 33170 > =46rom backend> D > =46rom backend (#4)> 33170 > =46rom backend> D > =46rom backend (#4)> 33170 > =46rom backend> D > =46rom backend (#4)> 33170 > =46rom backend> D > =46rom backend (#4)> 33170 > =46rom backend (#2)> 2 > =46rom backend (#4)> 33132 > =46rom backend (33132)> = \x01060000006c0000000103000000010000000a00000080ce26eb13eaa640003fad186435= 9840800c2b00bdeea6400055d > =46rom backend (#4)> 24 > =46rom backend (24)> "landuse"=3D>"residential" >=20 > My query looks like > select * from poly9(3130860.6785608195, 7592337.1455099881, = 3209132.1955248402, 7670608.6624740083); >=20 > while poly9 is function generated by app before data processing = starts. And basically it looks like: >=20 > CREATE FUNCTION "public"."poly9"(IN minx numeric, IN miny numeric, IN = maxx numeric, IN maxy numeric)=20 > returns table (geom_out bytea, tags_out public.hstore)=20 > AS $BODY$ > declare > bbox =3D st_makebox2d(st_makepoint(minx, miny), st_makepoint(maxx, = maxy)); > begin > -- simplified version > return query > SELECT way, tags FROM planet_polygon WHERE way && bbox;=20 > end; > $BODY$ > LANGUAGE plpgsql > COST 100 > ROWS 1000 > CALLED ON NULL INPUT > SECURITY INVOKER > VOLATILE; >=20 > Hope it helps. This solution worked fine 2 months ago. Is there a = small chance that something broken inside 9.3.4? >=20 > Best Regards, > Evgen Bodunov. >=20 > On Apr 25, 2014, at 10:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >=20 >> Stephen Frost <sfrost@snowman.net> writes: >>> * molind@gmail.com (molind@gmail.com) wrote: >>>> I added PQtrace for each thread and wait for stuck thread. There is = trace: >>=20 >>>> =46rom backend> D >>>> =46rom backend (#4)> 139891 >>>> =46rom backend> D >>>> =46rom backend (#4)> 139891 >>>> =46rom backend> D >>>> =46rom backend (#4)> 139891 >>>> =46rom backend> D >>>> =46rom backend (#4)> 139891 >>>> =46rom backend> D >>>> =46rom backend (#4)> 139891 >>>> =46rom backend> D >>>> =46rom backend (#4)> 139891 >>=20 >> FWIW, this looks suspiciously like libpq is unable to consume data = from >> its input buffer (and keeps retrying to process the same data). Are = you >> using PQsetnonblocking by any chance? If so, this probably indicates >> failure to follow the required call sequencing to process data. >>=20 >> Another likely theory, given that you mentioned multiple client = threads, >> is that the threads are stomping on each others' toes somehow. libpq >> does not defend itself against that: it's up to you to be sure that >> only one thread is touching each PGconn object. >>=20 >>>> Seems problem somewhere inside PostgreSQL. It tries to send result = but >>>> fails. >>=20 >>> I don't see anything here which shows that to be the case. >>=20 >> Indeed. I'd bet considerable money that this is a client-side issue. >> It's possible that it's a libpq bug, but what seems more likely is >> that you're using libpq improperly. >>=20 >> regards, tom lane >=20