Обсуждение: pq_recvbuf: unexpected EOF

Поиск
Список
Период
Сортировка

pq_recvbuf: unexpected EOF

От
David Link
Дата:
Hi

We went live a months back with very low volumn (so far) on
subscription based website.  And we are very happy.  App built on 100%
open source toolstack, with PostgreSQL (of course) at it's heart.  I
Can't tell you how pleased I am with Pg (coming from 5 year experience
with Oracle).

I'm writing to discuss some scaling issues.  After bombarding the
system using HTTPD::Bench::ApacheBench (from CPAN), she starts to fail
some where around 30-50 concurrent requests, depending on the query.
See error messages below.

Any advice would be helpful.
Thanks.  David Link, White Plains, NY

Specs:
 h/w: Dual AMD Athalon
 Red Hat 8.0 / Linux 2.4.7
 Apache 1.3.27 (w/ mod_perl)
 PostgreSQL 7.3.1
 perl 5.8

DB Specs:
  Size: 10 GB
  tables: 280 / largest: 550,000 tuples / 25,000 relpages (8K)
  indexes: 650

ERROR:

[4] LOG:  pq_recvbuf: unexpected EOF on client connection
[4] LOG:  server process (pid 28353) was terminated by signal 11
[5] LOG:  terminating any other active server processes
[4-1] WARNING:  Message from PostgreSQL backend:
[4-2] ^IThe Postmaster has informed me that some other backend
[4-3] ^Idied abnormally and possibly corrupted shared memory.
[4-4] ^II have rolled back the current transaction and am
[4-5] ^Igoing to terminate your database system connection and exit.
[4-6] ^IPlease reconnect to the database system and repeat your query.
[4-1] WARNING:  Message from PostgreSQL backend:
[4-6] ^IPlease reconnect to the database system and repeat your query.
[4-2] ^IThe Postmaster has informed me that some other backend
[4-3] ^Idied abnormally and possibly corrupted shared memory.
[4-4] ^II have rolled back the current transaction and am
[4-5] ^Igoing to terminate your database system connection and exit.
[4-6] ^IPlease reconnect to the database system and repeat your query.
[6] LOG:  all server processes terminated; reinitializing shared memory
and semaphores
[7] LOG:  database system was interrupted at 2003-04-24 16:41:41 EDT
[8] LOG:  checkpoint record is at D/6BF3E0F4
[9] LOG:  redo record is at D/6BF3E0F4; undo record is at 0/0; shutdown
FALSE
[10] LOG:  next transaction id: 27240895; next oid: 217108262
[11] LOG:  database system was not properly shut down; automatic
recovery in progress
[12] LOG:  ReadRecord: record with zero length at D/6BF3E134
[13] LOG:  redo is not required
[7] FATAL:  The database system is starting up
[14] LOG:  database system is ready


__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com


Re: pq_recvbuf: unexpected EOF

От
Tom Lane
Дата:
David Link <dvlink@yahoo.com> writes:
> [4] LOG:  pq_recvbuf: unexpected EOF on client connection
> [4] LOG:  server process (pid 28353) was terminated by signal 11
> [5] LOG:  terminating any other active server processes

That's not what I would call a "scaling issue" :-(

Can you get us a stack backtrace from the crashed backend?  Since
this is linux, you probably aren't getting core files (look for
$PGDATA/base/nnnnnn/core) --- if not, restart the postmaster under
"ulimit -c unlimited" (best to add that to its startup script).
Once you have a core file, try

    gdb /path/to/postgres-executable /path/to/core-file
    gdb> bt
    gdb> p debug_query_string
    gdb> quit

It would also be worthwhile to enable log_statement in postgresql.conf
so that you can get more info on the command or series of commands that
leads up to the crash.

Also, if you compiled Postgres yourself, it would be worth the trouble
to recompile with --enable-debug and --enable-cassert added to whatever
configure parameters you used before.  I'm not sure what it takes to do
the equivalent in an RPM-based installation.

            regards, tom lane


Re: pq_recvbuf: unexpected EOF

От
"scott.marlowe"
Дата:
On Fri, 25 Apr 2003, David Link wrote:

> Hi
>
> We went live a months back with very low volumn (so far) on
> subscription based website.  And we are very happy.  App built on 100%
> open source toolstack, with PostgreSQL (of course) at it's heart.  I
> Can't tell you how pleased I am with Pg (coming from 5 year experience
> with Oracle).
>
> I'm writing to discuss some scaling issues.  After bombarding the
> system using HTTPD::Bench::ApacheBench (from CPAN), she starts to fail
> some where around 30-50 concurrent requests, depending on the query.
> See error messages below.
>
> Any advice would be helpful.
> Thanks.  David Link, White Plains, NY
>
> Specs:
>  h/w: Dual AMD Athalon
>  Red Hat 8.0 / Linux 2.4.7
>  Apache 1.3.27 (w/ mod_perl)
>  PostgreSQL 7.3.1
>  perl 5.8
>
> DB Specs:
>   Size: 10 GB
>   tables: 280 / largest: 550,000 tuples / 25,000 relpages (8K)
>   indexes: 650
>
> ERROR:
>
> [4] LOG:  pq_recvbuf: unexpected EOF on client connection
> [4] LOG:  server process (pid 28353) was terminated by signal 11

Sig 11 usually means flakey hardware.  Check out www.memtest86.com and do
a couple dozen kernel or postgresql compiles to see if they get sig 11s
while running.

It's not uncommon for an error to be in a relatively unused spot of
memory, and only show up under heavy load as causing problems.


Re: pq_recvbuf: unexpected EOF

От
"scott.marlowe"
Дата:
On Fri, 25 Apr 2003, Tom Lane wrote:

> David Link <dvlink@yahoo.com> writes:
> > [4] LOG:  pq_recvbuf: unexpected EOF on client connection
> > [4] LOG:  server process (pid 28353) was terminated by signal 11
> > [5] LOG:  terminating any other active server processes
>
> That's not what I would call a "scaling issue" :-(

FYI, Sig 11s are almost always caused by bad hardware, or badly built code
(i.e. the build process broke somewhere, often bad memory caused the code
to get compiled with a bit or two flipped in odd places).


Re: pq_recvbuf: unexpected EOF

От
David Link
Дата:
Running with ulimit ...

  echo "setting max user processes to unlimited..."
  ulimit -u unlimited
      echo -n "$PSQL_START"
      su -l postgres -s /bin/sh -c "$PGLIB/bin/pg_ctl -D $PGDATA -p
$PGLIB/bin/postmaster -o '-i' start > $PGDATA/log/syslog 2>&1" <
/dev/null

but still no core.  I can't product that bt.  Any suggestions?


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Link <dvlink@yahoo.com> writes:
> > [4] LOG:  pq_recvbuf: unexpected EOF on client connection
> > [4] LOG:  server process (pid 28353) was terminated by signal 11
> > [5] LOG:  terminating any other active server processes
>
> That's not what I would call a "scaling issue" :-(
>
> Can you get us a stack backtrace from the crashed backend?  Since
> this is linux, you probably aren't getting core files (look for
> $PGDATA/base/nnnnnn/core) --- if not, restart the postmaster under
> "ulimit -c unlimited" (best to add that to its startup script).
> Once you have a core file, try
>
>     gdb /path/to/postgres-executable /path/to/core-file
>     gdb> bt
>     gdb> p debug_query_string
>     gdb> quit
>
> It would also be worthwhile to enable log_statement in
> postgresql.conf
> so that you can get more info on the command or series of commands
> that
> leads up to the crash.
>
> Also, if you compiled Postgres yourself, it would be worth the
> trouble
> to recompile with --enable-debug and --enable-cassert added to
> whatever
> configure parameters you used before.  I'm not sure what it takes to
> do
> the equivalent in an RPM-based installation.
>
>             regards, tom lane


__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com


Re: pq_recvbuf: unexpected EOF

От
Alvaro Herrera
Дата:
On Fri, Apr 25, 2003 at 02:32:20PM -0700, David Link wrote:
> Running with ulimit ...
>
>   echo "setting max user processes to unlimited..."
>   ulimit -u unlimited
>       echo -n "$PSQL_START"
>       su -l postgres -s /bin/sh -c "$PGLIB/bin/pg_ctl -D $PGDATA -p
> $PGLIB/bin/postmaster -o '-i' start > $PGDATA/log/syslog 2>&1" <
> /dev/null
>
> but still no core.  I can't product that bt.  Any suggestions?

Try ulimit -c...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)


Re: pq_recvbuf: unexpected EOF

От
Joe Conway
Дата:
Tom Lane wrote:
> Also, if you compiled Postgres yourself, it would be worth the trouble
> to recompile with --enable-debug and --enable-cassert added to whatever
> configure parameters you used before.  I'm not sure what it takes to do
> the equivalent in an RPM-based installation.
>

Get the source RPM and install it. Then read
/usr/share/doc/postgresql-7.3.2/README.rpm-dist, specifically the
section called "REBUILDING FROM SOURCE RPM". The executive summary is
you run something like this:

rpm --rebuild --define 'beta 1' postgresql-7.3.2-1PGDG.src.rpm

On Red Hat 8 & 9 use `rpmbuild` instead of `rpm`

The "beta" option compiles with --enable-debug and --enable-cassert. It
also disables stripping of symbols from the binaries.

Joe


Re: pq_recvbuf: unexpected EOF

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Fri, Apr 25, 2003 at 02:32:20PM -0700, David Link wrote:
>> Running with ulimit ...
>>
>> echo "setting max user processes to unlimited..."
>> ulimit -u unlimited
>> echo -n "$PSQL_START"
>> su -l postgres -s /bin/sh -c "$PGLIB/bin/pg_ctl -D $PGDATA -p
>> $PGLIB/bin/postmaster -o '-i' start > $PGDATA/log/syslog 2>&1" <
>> /dev/null
>>
>> but still no core.  I can't product that bt.  Any suggestions?

> Try ulimit -c...

Also, the "su" very probably negates any environment settings from the
outside shell script anyway.  It might work to put the ulimit command
into the postgres user's ~/.profile (or local equivalent).

Failing that, put it into the pg_ctl script ...

            regards, tom lane


Re: pq_recvbuf: unexpected EOF

От
David Link
Дата:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > On Fri, Apr 25, 2003 at 02:32:20PM -0700, David Link wrote:
> >> Running with ulimit ...
> >>
> >> echo "setting max user processes to unlimited..."
> >> ulimit -u unlimited
> >> echo -n "$PSQL_START"
> >> su -l postgres -s /bin/sh -c "$PGLIB/bin/pg_ctl -D $PGDATA -p
> >> $PGLIB/bin/postmaster -o '-i' start > $PGDATA/log/syslog 2>&1" <
> >> /dev/null
> >>
> >> but still no core.  I can't product that bt.  Any suggestions?
>
> > Try ulimit -c...
>
> Also, the "su" very probably negates any environment settings from
> the
> outside shell script anyway.  It might work to put the ulimit command
> into the postgres user's ~/.profile (or local equivalent).
>

It doesn't negate it, interestingly enough,  I tested by printing out
'ulimit -a ;' just prior to the call to postmaster within su's -c statement.

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com


Re: pq_recvbuf: unexpected EOF

От
David Link
Дата:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > On Fri, Apr 25, 2003 at 02:32:20PM -0700, David Link wrote:
> >> Running with ulimit ...
> >>
> >> echo "setting max user processes to unlimited..."
> >> ulimit -u unlimited
> >> echo -n "$PSQL_START"
> >> su -l postgres -s /bin/sh -c "$PGLIB/bin/pg_ctl -D $PGDATA -p
> >> $PGLIB/bin/postmaster -o '-i' start > $PGDATA/log/syslog 2>&1" <
> >> /dev/null
> >>
> >> but still no core.  I can't product that bt.  Any suggestions?
>
> > Try ulimit -c...
>

Thank you for helping me produce a core dump file. (ulimit -c worked)
Still trying to produce the backtrace ...

   gdb /path/to/postgres-executable /path/to/core-file

not working so well.  ...

[root@cairo root]# gdb /usr/local/pgsql/bin/postgres
$PGDATA/base/75479566/core
GNU gdb Red Hat Linux 7.x (5.0rh-15) (MI_OUT)
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux"...

warning: core file may not match specified executable file.
Core was generated by `postgres: video puma 127.0.0.1 SELE'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4...done.
Loaded symbols for /usr/lib/libreadline.so.4
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x0810e9a3 in ReleaseAndReadBuffer () at eval.c:41
41      eval.c: No such file or directory.
        in eval.c
(gdb) bt
#0  0x0810e9a3 in ReleaseAndReadBuffer () at eval.c:41
Cannot access memory at address 0xafffec38
(gdb)


sorry i don't know gdb that well.


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com


Re: pq_recvbuf: unexpected EOF

От
Tom Lane
Дата:
David Link <dvlink@yahoo.com> writes:
> warning: core file may not match specified executable file.

That seems suspicious.  You sure you pointed gdb at the correct postgres
executable?

> #0  0x0810e9a3 in ReleaseAndReadBuffer () at eval.c:41

This is pretty obviously bogus, since ReleaseAndReadBuffer isn't in eval.c.

It might be that you will need to rebuild postgres with debugging
symbols before you will get a useful backtrace.  I have noticed that on
some platforms, gdb's backtrace from a non-debug-enabled executable
is not just incomplete but flat-out wrong.  This looks like it could
be one of those cases.

            regards, tom lane


Re: pq_recvbuf: unexpected EOF

От
David Link
Дата:
Here the back trace.
(I rebuilt using --enable-debug and --enable-cassert,
I ran 'ulimit -u unlimited' as root before su'ing to postgres and
starting pg_ctl.
I ran 'ulimit -c unlimited' in the pg_ctl script itself (as postgres)

(gdb) bt
#0  0x4010ba01 in __kill () from /lib/i686/libc.so.6
#1  0x4010b7da in raise (sig=6) at ../sysdeps/posix/raise.c:27
#2  0x4010cf82 in abort () at ../sysdeps/generic/abort.c:88
#3  0x0817b285 in ExceptionalCondition () at assert.c:46
#4  0x081263e6 in UnpinBuffer (buf=0x402a6ea8) at freelist.c:147
#5  0x08125a64 in ReleaseBuffer (buffer=49) at bufmgr.c:1510
#6  0x080e47e7 in ExecClearTuple (slot=0x82d5aec) at execTuples.c:416
#7  0x080e4633 in ExecStoreTuple (tuple=0x82d5cb8, slot=0x82d5aec,
    buffer=2013, shouldFree=0 '\000') at execTuples.c:359
#8  0x080ea56f in SeqNext (node=0x82d5854) at nodeSeqscan.c:108
#9  0x080e4309 in ExecScan (node=0x82d5854, accessMtd=0x80ea4e4
<SeqNext>)
    at execScan.c:96
#10 0x080ea58b in ExecSeqScan (node=0x82d5854) at nodeSeqscan.c:133
#11 0x080e1dc9 in ExecProcNode (node=0x82d5854, parent=0x82d58e0)
    at execProcnode.c:291
#12 0x080e677a in ExecAgg (node=0x82d58e0) at nodeAgg.c:590
#13 0x080e1eb9 in ExecProcNode (node=0x82d58e0, parent=0x0)
    at execProcnode.c:357
#14 0x080e0b83 in ExecutePlan (estate=0x82d59c0, plan=0x82d58e0,
    operation=CMD_SELECT, numberTuples=0,
direction=ForwardScanDirection,
    destfunc=0x82d5e80) at execMain.c:954
#15 0x080e01bd in ExecutorRun (queryDesc=0x82d524c, estate=0x82d59c0,
    direction=ForwardScanDirection, count=0) at execMain.c:195
#16 0x081334bf in ProcessQuery (parsetree=0x82cf0c4, plan=0x82d58e0,
    dest=Remote, completionTag=0xbfffef30 "") at pquery.c:242
#17 0x08131aa4 in pg_exec_query_string (query_string=0x82cef9c,
dest=Remote,
    parse_context=0x82c3fb4) at postgres.c:838
#18 0x08132bc1 in PostgresMain (argc=4, argv=0xbffff160,
    username=0x8270321 "video") at postgres.c:2016
#19 0x081181c0 in DoBackend (port=0x82701f0) at postmaster.c:2293
#20 0x08117b12 in BackendStartup (port=0x82701f0) at postmaster.c:1915
#21 0x08116d75 in ServerLoop () at postmaster.c:1018
#22 0x081168ca in PostmasterMain (argc=2, argv=0x8256eb8) at
postmaster.c:779
#23 0x080f3907 in main (argc=2, argv=0xbffffaf4) at main.c:210
#24 0x400f9507 in __libc_start_main (main=0x80f3728 <main>, argc=2,
    ubp_av=0xbffffaf4, init=0x806a828 <_init>, fini=0x818db20 <_fini>,
    rtld_fini=0x4000dc14 <_dl_fini>, stack_end=0xbffffaec)
    at ../sysdeps/generic/libc-start.c:129
(gdb)


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Link <dvlink@yahoo.com> writes:
> > warning: core file may not match specified executable file.
>
> That seems suspicious.  You sure you pointed gdb at the correct
> postgres
> executable?
>
> > #0  0x0810e9a3 in ReleaseAndReadBuffer () at eval.c:41
>
> This is pretty obviously bogus, since ReleaseAndReadBuffer isn't in
> eval.c.
>
> It might be that you will need to rebuild postgres with debugging
> symbols before you will get a useful backtrace.  I have noticed that
> on
> some platforms, gdb's backtrace from a non-debug-enabled executable
> is not just incomplete but flat-out wrong.  This looks like it could
> be one of those cases.
>
>             regards, tom lane


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com


Re: RESOLUTION: pq_recvbuf: unexpected EOF

От
David Link
Дата:
Hello,

RESOLUTION TO THE PROBLEM:

There was a problem with some memory simms.  Memtest spotted it (Great
program, thanks), and then we confirmed it.  Similar testing run on
another identical box produced no errors.

Thank you all, Tom, Scott, Alvero, and Joe for amazing support.



--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> On Fri, 25 Apr 2003, David Link wrote:
>
> > Hi
> >
> > We went live a months back with very low volumn (so far) on
> > subscription based website.  And we are very happy.  App built on
> 100%
> > open source toolstack, with PostgreSQL (of course) at it's heart.
> I
> > Can't tell you how pleased I am with Pg (coming from 5 year
> experience
> > with Oracle).
> >
> > I'm writing to discuss some scaling issues.  After bombarding the
> > system using HTTPD::Bench::ApacheBench (from CPAN), she starts to
> fail
> > some where around 30-50 concurrent requests, depending on the
> query.
> > See error messages below.
> >
> > Any advice would be helpful.
> > Thanks.  David Link, White Plains, NY
> >
> > Specs:
> >  h/w: Dual AMD Athalon
> >  Red Hat 8.0 / Linux 2.4.7
> >  Apache 1.3.27 (w/ mod_perl)
> >  PostgreSQL 7.3.1
> >  perl 5.8
> >
> > DB Specs:
> >   Size: 10 GB
> >   tables: 280 / largest: 550,000 tuples / 25,000 relpages (8K)
> >   indexes: 650
> >
> > ERROR:
> >
> > [4] LOG:  pq_recvbuf: unexpected EOF on client connection
> > [4] LOG:  server process (pid 28353) was terminated by signal 11
>
> Sig 11 usually means flakey hardware.  Check out www.memtest86.com
> and do
> a couple dozen kernel or postgresql compiles to see if they get sig
> 11s
> while running.
>
> It's not uncommon for an error to be in a relatively unused spot of
> memory, and only show up under heavy load as causing problems.
>


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com