Обсуждение: BUG #8578: loading a 33G (compressed) pg_dump into a fresh host and db instance crashes a postgresql process
BUG #8578: loading a 33G (compressed) pg_dump into a fresh host and db instance crashes a postgresql process
От
robert@urban4m.com
Дата:
The following bug has been logged on the website: Bug reference: 8578 Logged by: Robert Nix Email address: robert@urban4m.com PostgreSQL version: 9.3.1 Operating system: Linux Ubuntu 12.10 Description: Below are the logs revealing the failure: 2013-11-05 17:04:21 EST LOG: server process (PID 5869) was terminated by signal 9: Killed 2013-11-05 17:04:21 EST DETAIL: Failed process was running: autovacuum: ANALYZE data_partitions.zone_municipality_demog_houston 2013-11-05 17:04:21 EST LOG: terminating any other active server processes 2013-11-05 17:04:21 EST WARNING: terminating connection because of crash of another server process 2013-11-05 17:04:21 EST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-11-05 17:04:21 EST HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-11-05 17:04:21 EST WARNING: terminating connection because of crash of another server process 2013-11-05 17:04:21 EST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-11-05 17:04:21 EST HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-11-05 17:04:21 EST WARNING: terminating connection because of crash of another server process 2013-11-05 17:04:21 EST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-11-05 17:04:21 EST HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-11-05 17:04:21 EST WARNING: terminating connection because of crash of another server process 2013-11-05 17:04:21 EST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-11-05 17:04:21 EST HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-11-05 17:04:21 EST CONTEXT: COPY zone_zipcode_demog_orlando, line 8986 2013-11-05 17:04:23 EST LOG: all server processes terminated; reinitializing 2013-11-05 17:04:23 EST LOG: database system was interrupted; last known up at 2013-11-05 17:01:22 EST 2013-11-05 17:04:23 EST LOG: database system was not properly shut down; automatic recovery in progress 2013-11-05 17:04:23 EST LOG: redo starts at 57/643FA1F8 2013-11-05 17:04:47 EST LOG: unexpected pageaddr 56/E5784000 in log segment 00000001000000570000008F, offset 7880704 2013-11-05 17:04:47 EST LOG: redo done at 57/8F783B68 2013-11-05 17:04:47 EST LOG: last completed transaction was at log time 2013-11-05 17:01:52.611585-05 2013-11-05 17:04:53 EST LOG: database system is ready to accept connections 2013-11-05 17:04:53 EST LOG: autovacuum launcher started psql reports: WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. CONTEXT: COPY zone_zipcode_demog_orlando, line 8986 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost The command being issued is: dropdb u; createdb u; cat u.sql.gz | gzip -d | psql u u.sql.gz was created as: pg_dump u | gzip >u.sql.gz The database from which the dump was created is the same, 9.3.1, same operating system, both Ubuntu 12.10. Db's have same extensions installed. The source database shows no signs of issues.
robert@urban4m.com writes: > 2013-11-05 17:04:21 EST LOG: server process (PID 5869) was terminated by > signal 9: Killed OOM killer, almost certainly. Check your kernel log to confirm. Then read about preventing OOM kills: http://www.postgresql.org/docs/9.3/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT regards, tom lane
Thanks for the quick response, Tom. Sure enough. I found the OOM message in syslog. And thanks for the link. I'll try those suggestions. Just out of curiosity, will using psql -f alter the behavior, in this case, with respect to memory? Thanks again On Tue, Nov 5, 2013 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > robert@urban4m.com writes: > > 2013-11-05 17:04:21 EST LOG: server process (PID 5869) was terminated by > > signal 9: Killed > > OOM killer, almost certainly. Check your kernel log to confirm. > Then read about preventing OOM kills: > > http://www.postgresql.org/docs/9.3/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT > > regards, tom lane > -- .nix
Robert Nix <robert@urban4m.com> writes: > Thanks for the quick response, Tom. Sure enough. I found the OOM message in > syslog. And thanks for the link. I'll try those suggestions. Just out of > curiosity, will using psql -f alter the behavior, in this case, with > respect to memory? No, since the memory consumption was happening on the server side. It is somewhat of interest why you're apparently getting a memory leak on the server side, though. Maybe you were using an unsafely large work_mem or maintenance_work_mem setting? I've also seen some reports lately suggesting that PostGIS might cause intra-query memory leaks, which might explain this if data_partitions.zone_municipality_demog_houston contains any PostGIS data types. regards, tom lane
re: *work_mem. They are the defaults. In fact, i only changed checkpoint_segments to 64 because of too frequent checkpointing. All others are still the defaults. re: PostGIS. Yes, the table has 3 geometries. Most of the database is tables with geometries and some are far larger than that one and most of the largest will have been loaded before that one. It's running again after changing the sysctl setting and i'm monitoring memory. Will keep you posted. Thanks On Tue, Nov 5, 2013 at 9:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Nix <robert@urban4m.com> writes: > > Thanks for the quick response, Tom. Sure enough. I found the OOM message > in > > syslog. And thanks for the link. I'll try those suggestions. Just out of > > curiosity, will using psql -f alter the behavior, in this case, with > > respect to memory? > > No, since the memory consumption was happening on the server side. > > It is somewhat of interest why you're apparently getting a memory leak on > the server side, though. Maybe you were using an unsafely large work_mem > or maintenance_work_mem setting? I've also seen some reports lately > suggesting that PostGIS might cause intra-query memory leaks, which might > explain this if data_partitions.zone_municipality_demog_houston contains > any PostGIS data types. > > regards, tom lane > -- .nix
The sysctl change worked. Thanks for helping, Tom. Very much appreciated. On Tue, Nov 5, 2013 at 10:05 PM, Robert Nix <robert@urban4m.com> wrote: > re: *work_mem. They are the defaults. In fact, i only changed > checkpoint_segments to 64 because of too frequent checkpointing. All others > are still the defaults. > > re: PostGIS. Yes, the table has 3 geometries. Most of the database is > tables with geometries and some are far larger than that one and most of > the largest will have been loaded before that one. > > It's running again after changing the sysctl setting and i'm monitoring > memory. Will keep you posted. > > Thanks > > > > On Tue, Nov 5, 2013 at 9:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Robert Nix <robert@urban4m.com> writes: >> > Thanks for the quick response, Tom. Sure enough. I found the OOM >> message in >> > syslog. And thanks for the link. I'll try those suggestions. Just out of >> > curiosity, will using psql -f alter the behavior, in this case, with >> > respect to memory? >> >> No, since the memory consumption was happening on the server side. >> >> It is somewhat of interest why you're apparently getting a memory leak on >> the server side, though. Maybe you were using an unsafely large work_mem >> or maintenance_work_mem setting? I've also seen some reports lately >> suggesting that PostGIS might cause intra-query memory leaks, which might >> explain this if data_partitions.zone_municipality_demog_houston contains >> any PostGIS data types. >> >> regards, tom lane >> > > > > -- > .nix > -- .nix