Обсуждение: backend exit mystery
I have libpq client program that repeatedly connects to a DB, queries, and then disconnects. After a seemingly random number of such successful sessions (sometimes 30, sometimes hundreds), the backend mysteriously exits after the client calls PQsetdbLogin(), and the client hangs. Any clues? Details below... Client: C program linked with 7.2.1 libpq on HP-UX B.11.00 E 9000/803. Server: PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96. Client code snippet: 1 if (text_db_conn == NULL || PQstatus(text_db_conn) != CONNECTION_OK) { 2 if (text_db_conn!=NULL) PQfinish(text_db_conn); 3 fprintf(stderr,"Connecting to DB...\n"); 4 fflush(stderr); 5 text_db_conn = PQsetdbLogin(IP, PORT, NULL, NULL, 6 "mydb", "myuser", NULL); 7 if (PQstatus(text_db_conn) == CONNECTION_BAD) { 8 fprintf(stderr,"Connection attempt failed.\n"); 9 } else { 10 fprintf(stderr,"Connected.\n"); 11 } 12 } Client hangs after line 5. Client backtrace when hanging: (gdb) bt #0 0x20da28 in _select_sys () #1 0x1ec788 in select () #2 0xb9818 in pqWait () #3 0x4000f0e0 in __d_trap_fptr () #4 0x1ec788 in select () Error accessing memory address 0xffffffbf: Bad address. Server log (with server_min_messages = debug5) shows: 2003-10-10 17:04:01 [28501] DEBUG: BackendStartup: forked pid=20296 socket=8 2003-10-10 17:04:01 [20296] LOG: connection received: host=10.0.1.1 port=61438 2003-10-10 17:05:34 [28501] DEBUG: reaping dead processes 2003-10-10 17:05:34 [28501] DEBUG: child process (pid 20296) exited with exit code 0 I attached to this backend before it exited, and got this backtrace: (gdb) bt #0 0x420e8182 in recv () from /lib/i686/libc.so.6 #1 0x081115c8 in secure_read (port=0x82be9f0, ptr=0x826f100, len=8192) at be-secure.c:301 #2 0x08115322 in pq_recvbuf () at pqcomm.c:463 #3 0x08115439 in pq_getbytes (s=0xbfffdd70 "Ho*\b8?*\b8???d\210\024", len=4) at pqcomm.c:538 #4 0x081472fa in ProcessStartupPacket (port=0x82be9f0, SSLdone=0 '\0') at postmaster.c:1094 #5 0x08148914 in DoBackend (port=0x82be9f0) at postmaster.c:2178 #6 0x081483ff in BackendStartup (port=0x82be9f0) at postmaster.c:1924 #7 0x081471f1 in ServerLoop () at postmaster.c:1027 #8 0x08146be6 in PostmasterMain (argc=4, argv=0x82a5ae8) at postmaster.c:788 #9 0x081160dc in main (argc=4, argv=0xbfffe8b4) at main.c:210 #10 0x42017499 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) p debug_query_string $1 = 0x0 TIA.
On Fri, 10 Oct 2003, Ed L. wrote: > I have libpq client program that repeatedly connects to a DB, queries, and > then disconnects. After a seemingly random number of such successful > sessions (sometimes 30, sometimes hundreds), the backend mysteriously exits > after the client calls PQsetdbLogin(), and the client hangs. Any clues? > Details below... > > Client: C program linked with 7.2.1 libpq on HP-UX B.11.00 E 9000/803. > Server: PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96. > How's the memory situation on the server box? Don't forget linux has that GREAT feature that randomly kills processes when memory is tight.... Perhaps there's something in syslog -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
On Saturday October 11 2003 9:00, Jeff wrote: > On Fri, 10 Oct 2003, Ed L. wrote: > > I have libpq client program that repeatedly connects to a DB, queries, > > and then disconnects. After a seemingly random number of such > > successful sessions (sometimes 30, sometimes hundreds), the backend > > mysteriously exits after the client calls PQsetdbLogin(), and the > > client hangs. Any clues? Details below... > > > > Client: C program linked with 7.2.1 libpq on HP-UX B.11.00 E 9000/803. > > Server: PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96. > > How's the memory situation on the server box? > Don't forget linux has that GREAT feature that randomly kills processes > when memory is tight.... Perhaps there's something in syslog Hmmm... it's quite repeatable, though not easy to predict after how many sessions, so I don't think it's that. Happens whether memory is tight or not.
On Friday October 10 2003 4:46, Ed L. wrote: > I have libpq client program that repeatedly connects to a DB, queries, > and then disconnects. After a seemingly random number of such successful > sessions (sometimes 30, sometimes hundreds), the backend mysteriously > exits after the client calls PQsetdbLogin(), and the client hangs. Any > clues? Details below... > > Client: C program linked with 7.2.1 libpq on HP-UX B.11.00 E 9000/803. > Server: PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96. Still looking for clues as to the cause of this repeatable connection failure. Passing an explicit connection timeout to PQconnectdb() escapes from any long hangs, but the hanging is still an issue I'd like to understand. Attached is a small C program that reliably reproduces this problem on the setup above. I added an explicit timeout to PQconnectdb() to wait only 30 seconds. I'm curious to know if anyone can easily repeat the problem (careful, it will generate a bit of traffic, cpu load, and run forever). My last example run showed 17 timeouts seemingly randomly dispersed among 5000 consecutive connection attempts. The server has plenty of available memory on a dual processor machine running Linux 2.4.18-3smp. Tried to catch snapshot data from netstat on Recv-Q and Send-Q sizes on the server during a hang... that's a little iffy with the timing of grepping netstat output, but seems like the server's Recv-Q's were always zero and the Send-Q's were occasional in the tens (bytes?). TIA for any help.
Вложения
Ed Loehr <ed@LoehrTech.com> writes: > Attached is a small C program that reliably reproduces this=20 > problem on the setup above. I added an explicit timeout to PQconnectdb()= > to wait only 30 seconds. I'm curious to know if anyone can easily repeat= > the problem (careful, it will generate a bit of traffic, cpu load, and run= > forever). My last example run showed 17 timeouts seemingly randomly=20 > dispersed among 5000 consecutive connection attempts.=20=20 I tried to duplicate the problem, without success --- 20000 connection attempts without failure. Setup is HPUX 10.20 client, RHL8 server (2.4.18-24.8.0 kernel); but it's a single-processor machine, not dual as in your example. I was using CVS-tip PG sources, also. regards, tom lane