Обсуждение: SQL command to dump the contents of table failed: PQendcopy() failed. Error message from server: socket not open
Hi: I run a cron job every day to dump all the tables in my 7.4 Postgres database. For one of the tables (sample) the command returns this error: -bash-2.05b$ /usr/local/pgsql/bin/pg_dump -Fc -t sample heos -f /home/bu/5/sample.dump pg_dump: socket not open pg_dump: SQL command to dump the contents of table "sample" failed: PQendcopy() failed. pg_dump: Error message from server: socket not open pg_dump: The command was: COPY public.sample (field1, field2, field3, field4, field5) TO stdout; This is what the log shows: -bash-2.05b$ more /usr/local/pgsql/logfile LOG: database system was shut down at 2005-09-23 13:43:16 CEST LOG: checkpoint record is at 122/E90B52AC LOG: redo record is at 122/E90B52AC; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 2227508; next OID: 30689327 LOG: database system is ready LOG: server process (PID 27688) was terminated by signal 11 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: all server processes terminated; reinitializing LOG: database system was interrupted at 2005-09-23 13:54:37 CEST LOG: checkpoint record is at 122/E90C3D38 LOG: redo record is at 122/E90C3D38; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 2227617; next OID: 30697519 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 122/E90C3D78 LOG: redo is not required LOG: database system is ready This happens every day consistently. The table is quite big, about 1 million tuples and it is vacuumed. Any help would be appreciated. Cheers, Ruben.
On Fri, Sep 23, 2005 at 01:34:18PM +0200, ruben wrote: > > LOG: server process (PID 27688) was terminated by signal 11 This suggests a bug in the backend. There should be a core dump somewhere under $PGDATA (unless resource limits prevent it or your system is configured to put core dumps elsewhere) -- can you use a debugger to get a stack trace from it? What exact version of PostgreSQL are you running and on what operating system? Do you have any non-standard extensions to PostgreSQL (custom types, third-party modules, etc.)? -- Michael Fuhr
Hi Michael: The operating system is Red Hat Linux release 8.0 (Psyche) and PostgreSQL version is 7.4.6., without non-standard extensions. I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I don't know how to debug it to get a stack trace, I'll find out). Thanks, Ruben. Michael Fuhr wrote: > On Fri, Sep 23, 2005 at 01:34:18PM +0200, ruben wrote: > >>LOG: server process (PID 27688) was terminated by signal 11 > > > This suggests a bug in the backend. There should be a core dump > somewhere under $PGDATA (unless resource limits prevent it or your > system is configured to put core dumps elsewhere) -- can you use a > debugger to get a stack trace from it? What exact version of > PostgreSQL are you running and on what operating system? Do you > have any non-standard extensions to PostgreSQL (custom types, > third-party modules, etc.)? >
ruben <ruben20@superguai.com> writes: > I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I > don't know how to debug it to get a stack trace, I'll find out). It would normally be in the per-database subdirectory ($PGDATA/base/NNN/) for the database where the problem occurs. However, if you don't see a core file there either, that probably means the postmaster was started under "ulimit -c 0" to prevent core dumps. Add "ulimit -c unlimited" to the postmaster start script and restart it. (For the sake of the archives, I'll mention that as of PG 8.1 core dumps will appear directly in $PGDATA, not in its subdirectories.) regards, tom lane
On Fri, Sep 23, 2005 at 05:11:19PM +0200, ruben wrote: > The operating system is Red Hat Linux release 8.0 (Psyche) and > PostgreSQL version is 7.4.6., without non-standard extensions. > > I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I > don't know how to debug it to get a stack trace, I'll find out). Did you look everywhere under $PGDATA or just in that directory? As I recall, released versions of PostgreSQL usually dump core under $PGDATA/base/<database oid>. However, it's also possible that your coredumpsize resource limit prevents core dumps; you could fix that by putting a command like "ulimit -c unlimited" in your PostgreSQL startup script and then stopping and restarting PostgreSQL. Once you have a core dump, you can get a stack trace with gdb: $ gdb /path/to/postgres /path/to/core ... (gdb) bt If your postgres binary was built with debugging symbols then the stack trace should show function names, file names, and line numbers. Can you duplicate the backend crash from psql if you issue the COPY command that pg_dump complained about? What about if you issue a SELECT for all records in the table? What does "\d tablename" show for the table in question? -- Michael Fuhr
Thanks Tom and Michael: Michael Fuhr wrote: > On Fri, Sep 23, 2005 at 05:11:19PM +0200, ruben wrote: > >>The operating system is Red Hat Linux release 8.0 (Psyche) and >>PostgreSQL version is 7.4.6., without non-standard extensions. >> >>I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I >>don't know how to debug it to get a stack trace, I'll find out). > > > Did you look everywhere under $PGDATA or just in that directory? > As I recall, released versions of PostgreSQL usually dump core under > $PGDATA/base/<database oid>. However, it's also possible that your > coredumpsize resource limit prevents core dumps; you could fix that > by putting a command like "ulimit -c unlimited" in your PostgreSQL > startup script and then stopping and restarting PostgreSQL. > > Once you have a core dump, you can get a stack trace with gdb: > > $ gdb /path/to/postgres /path/to/core > ... > (gdb) bt > > If your postgres binary was built with debugging symbols then the > stack trace should show function names, file names, and line numbers. I cannot find a core file, ulimit is set to unlimit. I guess I'm doing something wrong: -bash-2.05b$ ulimit unlimited -bash-2.05b$ find /usr/local/pgsql -name '*core*' -print -bash-2.05b$ > Can you duplicate the backend crash from psql if you issue the COPY > command that pg_dump complained about? -bash-2.05b$ /usr/local/pgsql/bin/pg_dump -Fc -t llamadas heos -f /home/buheos/5/llamadas3.dump pg_dump: socket not open pg_dump: SQL command to dump the contents of table "llamadas" failed: PQendcopy() failed. pg_dump: Error message from server: socket not open pg_dump: The command was: COPY public.llamadas (cod_empresa, fecha_llamada, tfno_origen, tfno_destino, duracion_llamada, hora_llamada, cod_destino_llamada, cod_pais_destino, cod_destino_internacional, franja_horaria, importe, cod_fuente, precio_coste_llamada, observaciones_llamada, coment_llamada, fecha_factura, num_factura, fecha_alta, fecha_ult_mod, fecha_sis_alta, usuario_alta, i_a_alta, fecha_sis_ult_mod, usuario_ult_mod, i_a_ult_mod, periodicidad_facturacion, cod_operador, franja_horaria_operador, fichero_origen, cod_destino_internacional_operador) TO stdout; > What about if you issue a SELECT for all records in the table? heos=# select * from llamadas; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. Failed. In the logfile: LOG: server process (PID 7069) was terminated by signal 11 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: all server processes terminated; reinitializing LOG: database system was interrupted at 2005-09-26 16:49:43 CEST LOG: checkpoint record is at 125/858E0144 LOG: redo record is at 125/858E0144; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 2270061; next OID: 30820346 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 125/858E0184 LOG: redo is not required LOG: database system is ready LOG: unexpected EOF on client connection > What does "\d tablename" show for the table in question? heos=# \d llamadas; Table "public.llamadas" Column | Type | Modifiers ------------------------------------+--------------------------+---------------------------------------------------- cod_empresa | smallint | not null fecha_llamada | date | tfno_origen | character(15) | tfno_destino | character(15) | duracion_llamada | integer | hora_llamada | time without time zone | default ('now'::text)::time(6) with time zone cod_destino_llamada | character(1) | cod_pais_destino | integer | cod_destino_internacional | character(15) | franja_horaria | character(1) | importe | real | cod_fuente | integer | precio_coste_llamada | real | observaciones_llamada | character varying(100) | coment_llamada | character varying(100) | fecha_factura | date | num_factura | integer | fecha_alta | date | fecha_ult_mod | date | fecha_sis_alta | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone usuario_alta | character(10) | i_a_alta | character(15) | fecha_sis_ult_mod | timestamp with time zone | usuario_ult_mod | character(10) | i_a_ult_mod | character(15) | periodicidad_facturacion | character(1) | cod_operador | character(2) | franja_horaria_operador | character(1) | fichero_origen | character varying(100) | cod_destino_internacional_operador | character(15) | Indexes: "llamadas_i01" btree (cod_empresa, fecha_llamada, tfno_origen)
ruben <ruben20@superguai.com> writes: >> As I recall, released versions of PostgreSQL usually dump core under >> $PGDATA/base/<database oid>. However, it's also possible that your >> coredumpsize resource limit prevents core dumps; you could fix that >> by putting a command like "ulimit -c unlimited" in your PostgreSQL >> startup script and then stopping and restarting PostgreSQL. > I cannot find a core file, ulimit is set to unlimit. I guess I'm doing > something wrong: > -bash-2.05b$ ulimit > unlimited > -bash-2.05b$ find /usr/local/pgsql -name '*core*' -print (1) The fact that it's unlimited in your user environment doesn't prove that it's unlimited in the environment the postmaster is started in. (2) I forget which constraint ulimit-with-no-argument prints, but it's not core file size. (Try "ulimit -a") Please actually follow the advice given to you above. regards, tom lane
On Mon, Sep 26, 2005 at 07:03:06PM +0200, ruben wrote: > > I guess I'm doing something wrong: > > -bash-2.05b$ ulimit > unlimited Please read manpages, in this case bash: ulimit -a Peter
Hi Tom: No way to get the core dump, this is what I did: 1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start in the "start" section of the script: ... ulimit -c unlimited su -l postgres -s /bin/sh -c "/usr/local/pgsql/bin/pg_ctl -D $PGDATA -p /usr/local/pgsql/bin/postmaster -l $DIRLOG/logfile start > /dev/null 2>&1" < /dev/null ... I tried "man ulimit" but it didn't work, "man bash" returns the command help. 2) Restart postmaster /etc/rc.d/init.d/postgresql restart 3) Run the query that produces the crash. 4) Search core file: -bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print -bash-2.05b$ Nothing comes up. Thanks for your help. Ruben. Tom Lane wrote: > ruben <ruben20@superguai.com> writes: > >>>As I recall, released versions of PostgreSQL usually dump core under >>>$PGDATA/base/<database oid>. However, it's also possible that your >>>coredumpsize resource limit prevents core dumps; you could fix that >>>by putting a command like "ulimit -c unlimited" in your PostgreSQL >>>startup script and then stopping and restarting PostgreSQL. > > >>I cannot find a core file, ulimit is set to unlimit. I guess I'm doing >>something wrong: > > >>-bash-2.05b$ ulimit >>unlimited >>-bash-2.05b$ find /usr/local/pgsql -name '*core*' -print > > > (1) The fact that it's unlimited in your user environment doesn't prove > that it's unlimited in the environment the postmaster is started in. > > (2) I forget which constraint ulimit-with-no-argument prints, but it's > not core file size. (Try "ulimit -a") > > Please actually follow the advice given to you above. > > regards, tom lane >
On Wed, Sep 28, 2005 at 11:40:11AM +0200, ruben wrote: > Hi Tom: > > No way to get the core dump, this is what I did: > > 1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start > in the "start" section of the script: <snip> > 2) Restart postmaster > > /etc/rc.d/init.d/postgresql restart Does that run the start section? Maybe you need to stop/start. Maybe the bash_profile/bashrc for the postgres user resets the core limit. > 4) Search core file: > > -bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print > -bash-2.05b$ It'll be under $PGDATA, where is that? It may be under /var/lib/postgres, depending on how you installed... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
Thanks Martijn: Martijn van Oosterhout wrote: > On Wed, Sep 28, 2005 at 11:40:11AM +0200, ruben wrote: > >>Hi Tom: >> >>No way to get the core dump, this is what I did: >> >>1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start >>in the "start" section of the script: > > > <snip> > >>2) Restart postmaster >> >> /etc/rc.d/init.d/postgresql restart > > > Does that run the start section? Maybe you need to stop/start. Maybe > the bash_profile/bashrc for the postgres user resets the core limit. The server has been rebooted. This is the postgres start script ($PGDATA is /usr/local/pgsql/): http://80.33.3.245/temp/postgres.txt Regading the core limit: -bash-2.05b$ whoami postgres -bash-2.05b$ cat ~/.bash_profile PGDATA=/var/lib/pgsql/data [ -f $PGDATA/../initdb.i18n ] && source $PGDATA/../initdb.i18n export PGDATA -bash-2.05b$ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 7168 virtual memory (kbytes, -v) unlimited >>4) Search core file: >> >> -bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print >> -bash-2.05b$ > > > It'll be under $PGDATA, where is that? It may be under > /var/lib/postgres, depending on how you installed... Indeed I searched the whole file structure for core files. Regards.