My Experiment of PG crash when dealing with huge amount of data

Поиск
Список
Период
Сортировка
От 高健
Тема My Experiment of PG crash when dealing with huge amount of data
Дата
Msg-id CAL454F3M19-Qgbfj3fp8_=w3CWWy5AWGKNTP=sD33EfFDv6JnA@mail.gmail.com
обсуждение исходный текст
Ответы Re: My Experiment of PG crash when dealing with huge amount of data  (hxreno1 <hxreno1@gmail.com>)
Re: My Experiment of PG crash when dealing with huge amount of data  (Michael Paquier <michael.paquier@gmail.com>)
Re: My Experiment of PG crash when dealing with huge amount of data  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-general
Hello:

I  have done the following experiment to test :

PG's activity when dealing with data which is bigger in size than total memory of the whole os system.

The result is:
PG says:
---------------------------- 
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
The connection to the server was lost. Attempting reset: Failed.
!> 
-----------------------------

In log, I can see the following:
LOG:  background writer process (PID 3221) was terminated by signal 9: Killed

But why it need to kill the bgwriter ? 
And that PG will use as much resource as it can to finish a sql dml , till it used almost all the resource and then crash?  

I used the default  checkpoint_segments(3) and shard_buffers(32MB) settings. And my machine has only 1024MB memory.
The process is as :
1) create table whose one tuple is more than 1KB:

postgres=# create table test01(id integer, val char(1024)); 

2) insert into the table 2457600 records, which makes totally more than 2400MB:

postgres=# insert into test01 values(generate_series(1,2457600),repeat( chr(int4(random()*26)+65),1024));

It really took a few minutes, When the sql statement run, I can see that the server process is consuming 80% of total memory of the os.

----------------------------------------------------------------------------------------------------------------------------------

[root@server ~]# ps aux | grep post

root      3180  0.0  0.0 105296   712 pts/1    S    16:31   0:00 su - postgres

postgres  3181  0.0  0.0  70304   676 pts/1    S+   16:31   0:00 -bash

postgres  3219  0.0  0.2 113644  2864 pts/1    S    16:32   0:00 /usr/local/pgsql/bin/postgres -D /gao/data

postgres  3221  0.4  3.0 113724 35252 ?        Ss   16:32   0:01 postgres: writer process                  

postgres  3222  0.2  0.1 113644  1616 ?        Ds   16:32   0:00 postgres: wal writer process              

postgres  3223  0.0  0.0 114380  1148 ?        Ss   16:32   0:00 postgres: autovacuum launcher process     

postgres  3224  0.0  0.0  73332   472 ?        Ss   16:32   0:00 postgres: stats collector process         

root      3252  0.0  0.0 105296   712 pts/2    S    16:32   0:00 su - postgres

postgres  3253  0.0  0.0  70304   676 pts/2    S    16:32   0:00 -bash

postgres  3285  0.0  0.0  83488   740 pts/2    S+   16:32   0:00 ./psql

postgres  3286 14.8 80.2 2598332 924308 ?      Ds   16:32   0:35 postgres: postgres postgres [local] INSERT

root      3333  0.0  0.0  65424   812 pts/3    S+   16:36   0:00 grep post

--------------------------------------------------------

After a while, I found in the PG's log , the following information:
--------------------------------------------------------

LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (11 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (15 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (23 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (13 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  background writer process (PID 3221) was terminated by signal 9: Killed
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
FATAL:  the database system is in recovery mode
LOG:  database system was interrupted; last known up at 2013-08-30 16:36:42 CST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/B7657BD0
LOG:  redo starts at 0/B60FE2B8
LOG:  unexpected pageaddr 0/B044C000 in log file 0, segment 184, offset 4505600
LOG:  redo done at 0/B844B940
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

-------------------------------------------------------------------------------------

Best Regards

В списке pgsql-general по дате отправления:

Предыдущее
От: Mistina Michal
Дата:
Сообщение: pgpool-ha not found in yum postgres repository
Следующее
От: Andreas Joseph Krogh
Дата:
Сообщение: Re: Why is NULL = unbounded for rangetypes?