Обсуждение: backend exit mystery

Поиск
Список
Период
Сортировка

backend exit mystery

От
"Ed L."
Дата:
I have libpq client program that repeatedly connects to a DB, queries, and
then disconnects.  After a seemingly random number of such successful
sessions (sometimes 30, sometimes hundreds), the backend mysteriously exits
after the client calls PQsetdbLogin(), and the client hangs.  Any clues?
Details below...

Client:  C program linked with 7.2.1 libpq on HP-UX B.11.00 E 9000/803.
Server:  PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96.

Client code snippet:

1    if (text_db_conn == NULL || PQstatus(text_db_conn) != CONNECTION_OK) {
2        if (text_db_conn!=NULL) PQfinish(text_db_conn);
3        fprintf(stderr,"Connecting to DB...\n");
4        fflush(stderr);
5        text_db_conn = PQsetdbLogin(IP, PORT, NULL, NULL,
6                        "mydb", "myuser", NULL);
7        if (PQstatus(text_db_conn) == CONNECTION_BAD) {
8            fprintf(stderr,"Connection attempt failed.\n");
9    } else {
10        fprintf(stderr,"Connected.\n");
11    }
12    }

Client hangs after line 5.  Client backtrace when hanging:

(gdb) bt
#0  0x20da28 in _select_sys ()
#1  0x1ec788 in select ()
#2  0xb9818 in pqWait ()
#3  0x4000f0e0 in __d_trap_fptr ()
#4  0x1ec788 in select ()
Error accessing memory address 0xffffffbf: Bad address.

Server log (with server_min_messages = debug5) shows:

2003-10-10 17:04:01 [28501]  DEBUG:  BackendStartup: forked pid=20296
socket=8
2003-10-10 17:04:01 [20296]  LOG:  connection received: host=10.0.1.1
port=61438
2003-10-10 17:05:34 [28501]  DEBUG:  reaping dead processes
2003-10-10 17:05:34 [28501]  DEBUG:  child process (pid 20296) exited with
exit code 0

I attached to this backend before it exited, and got this backtrace:

(gdb) bt
#0  0x420e8182 in recv () from /lib/i686/libc.so.6
#1  0x081115c8 in secure_read (port=0x82be9f0, ptr=0x826f100, len=8192) at
be-secure.c:301
#2  0x08115322 in pq_recvbuf () at pqcomm.c:463
#3  0x08115439 in pq_getbytes (s=0xbfffdd70 "Ho*\b8?*\b8???d\210\024",
len=4) at pqcomm.c:538
#4  0x081472fa in ProcessStartupPacket (port=0x82be9f0, SSLdone=0 '\0') at
postmaster.c:1094
#5  0x08148914 in DoBackend (port=0x82be9f0) at postmaster.c:2178
#6  0x081483ff in BackendStartup (port=0x82be9f0) at postmaster.c:1924
#7  0x081471f1 in ServerLoop () at postmaster.c:1027
#8  0x08146be6 in PostmasterMain (argc=4, argv=0x82a5ae8) at
postmaster.c:788
#9  0x081160dc in main (argc=4, argv=0xbfffe8b4) at main.c:210
#10 0x42017499 in __libc_start_main () from /lib/i686/libc.so.6
(gdb) p debug_query_string
$1 = 0x0


TIA.

Re: backend exit mystery

От
Jeff
Дата:
On Fri, 10 Oct 2003, Ed L. wrote:

> I have libpq client program that repeatedly connects to a DB, queries, and
> then disconnects.  After a seemingly random number of such successful
> sessions (sometimes 30, sometimes hundreds), the backend mysteriously exits
> after the client calls PQsetdbLogin(), and the client hangs.  Any clues?
> Details below...
>
> Client:  C program linked with 7.2.1 libpq on HP-UX B.11.00 E 9000/803.
> Server:  PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96.
>

How's the memory situation on the server box?
Don't forget linux has that GREAT feature that randomly kills processes
when memory is tight.... Perhaps there's something in syslog
--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



Re: backend exit mystery

От
"Ed L."
Дата:
On Saturday October 11 2003 9:00, Jeff wrote:
> On Fri, 10 Oct 2003, Ed L. wrote:
> > I have libpq client program that repeatedly connects to a DB, queries,
> > and then disconnects.  After a seemingly random number of such
> > successful sessions (sometimes 30, sometimes hundreds), the backend
> > mysteriously exits after the client calls PQsetdbLogin(), and the
> > client hangs.  Any clues? Details below...
> >
> > Client:  C program linked with 7.2.1 libpq on HP-UX B.11.00 E 9000/803.
> > Server:  PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96.
>
> How's the memory situation on the server box?
> Don't forget linux has that GREAT feature that randomly kills processes
> when memory is tight.... Perhaps there's something in syslog

Hmmm... it's quite repeatable, though not easy to predict after how many
sessions, so I don't think it's that.  Happens whether memory is tight or
not.

Re: backend exit mystery

От
Ed Loehr
Дата:
On Friday October 10 2003 4:46, Ed L. wrote:
> I have libpq client program that repeatedly connects to a DB, queries,
> and then disconnects.  After a seemingly random number of such successful
> sessions (sometimes 30, sometimes hundreds), the backend mysteriously
> exits after the client calls PQsetdbLogin(), and the client hangs.  Any
> clues? Details below...
>
> Client:  C program linked with 7.2.1 libpq on HP-UX B.11.00 E 9000/803.
> Server:  PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96.

Still looking for clues as to the cause of this repeatable connection
failure.  Passing an explicit connection timeout to PQconnectdb() escapes
from any long hangs, but the hanging is still an issue I'd like to
understand.  Attached is a small C program that reliably reproduces this
problem on the setup above.  I added an explicit timeout to PQconnectdb()
to wait only 30 seconds.  I'm curious to know if anyone can easily repeat
the problem (careful, it will generate a bit of traffic, cpu load, and run
forever).  My last example run showed 17 timeouts seemingly randomly
dispersed among 5000 consecutive connection attempts.

The server has plenty of available memory on a dual processor machine
running Linux 2.4.18-3smp.  Tried to catch snapshot data from netstat on
Recv-Q and Send-Q sizes on the server during a hang... that's a little iffy
with the timing of grepping netstat output, but seems like the server's
Recv-Q's were always zero and the Send-Q's were occasional in the tens
(bytes?).

TIA for any help.

Вложения

Re: backend exit mystery

От
Tom Lane
Дата:
Ed Loehr <ed@LoehrTech.com> writes:
> Attached is a small C program that reliably reproduces this=20
> problem on the setup above.  I added an explicit timeout to PQconnectdb()=
> to wait only 30 seconds.  I'm curious to know if anyone can easily repeat=
> the problem (careful, it will generate a bit of traffic, cpu load, and run=
> forever).  My last example run showed 17 timeouts seemingly randomly=20
> dispersed among 5000 consecutive connection attempts.=20=20

I tried to duplicate the problem, without success --- 20000 connection
attempts without failure.  Setup is HPUX 10.20 client, RHL8 server
(2.4.18-24.8.0 kernel); but it's a single-processor machine, not dual as
in your example.  I was using CVS-tip PG sources, also.

            regards, tom lane