Re: [GENERAL] Strange checkpoint behavior - checkpoints take a long time

Поиск
Список
Период
Сортировка
От Vladimir Nicolici
Тема Re: [GENERAL] Strange checkpoint behavior - checkpoints take a long time
Дата
Msg-id 59d7cdda.952d1c0a.13518.9efc@mx.google.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Strange checkpoint behavior - checkpoints take a long time  (Vladimir Nicolici <vladnc@gmail.com>)
Ответы Re: [GENERAL] Strange checkpoint behavior - checkpoints take a long time  (Vladimir Nicolici <vladnc@gmail.com>)
Список pgsql-general

Further updates:

 

Yesterday checkpoints were finishing more or less on time with the configuration for 25 minutes out of 30 minutes, taking 26 minutes at most.

 

So for today I reduced the time reserved for checkpoint writes to 20 minutes out of 30 minutes, by setting checkpoint_completion_target to 0.66 instead of 0.83, and I disabled the bgwriter completely by setting bgwriter_lru_maxpages from 100 to 0.

 

I expected this might cause the checkpoints to take more time, and not finish in 20 minutes, but that didn’t happen.

 

For the first part of the day, when the activity was lower, checkpoints finished almost on time, less than 30 seconds more than configured:

 

2017-10-06 10:30:56.248 EDT [4927] LOG:  checkpoint starting: time

2017-10-06 10:51:26.227 EDT [4927] LOG:  checkpoint complete: wrote 4364979 buffers (23.1%); 0 transaction log file(s) added, 0 removed, 1936 recycled; write=1187.940 s, sync=1.888 s, total=1229.979 s; sync files=744, longest=0.041 s, average=0.002 s; distance=31073898 kB, estimate=32879405 kB

 

Then later, as the activity increased, something I didn’t expect happened. The checkpoints started to finish much sooner than configured, in 16:38 minutes instead of 20 minutes, so 3:22 minutes earlier than configured:

 

2017-10-06 11:00:56.347 EDT [4927] LOG:  checkpoint starting: time

2017-10-06 11:17:34.778 EDT [4927] LOG:  checkpoint complete: wrote 5376422 buffers (28.5%); 0 transaction log file(s) added, 0 removed, 1897 recycled; write=959.908 s, sync=1.578 s, total=998.431 s; sync files=772, longest=0.045 s, average=0.002 s; distance=37712624 kB, estimate=37712624 kB

 

Then, even stranger, the next checkpoint started sooner than configured, by 2:11 minutes, with the reason “xlog” instead of “time”:

 

2017-10-06 11:27:48.754 EDT [4927] LOG:  checkpoint starting: xlog

2017-10-06 11:42:00.131 EDT [4927] LOG:  checkpoint complete: wrote 6123787 buffers (32.4%); 0 transaction log file(s) added, 239 removed, 2063 recycled; write=806.887 s, sync=1.032 s, total=851.377 s; sync files=767, longest=0.052 s, average=0.001 s; distance=40254253 kB, estimate=40254253 kB

 

All subsequent checkpoints for the rest of the day look similar, they start with the reason “xlog”, and finish in around 15 minutes each instead of 20, and the time between two checkpoint starts is around 25 minutes instead of 30.

 

I’m not sure what causes these early finishes and early starts. The checkpoint start message is not very helpful in determining why it started early, in my opinion. It’s saying “xlog”, but a more verbose description, maybe even including some numbers it based its decision to start the checkpoint on, would be more helpful.

 

I’m guessing the reason for starting a checkpoint is that it estimated the 96 GB reserved with max_wal_size would be filled unless it starts a checkpoint earlier than configured. I’ll increase the max_wal_size from 96 GB to 144 GB for Monday, to see if it changes this behavior.

 

Thanks,

Vlad

В списке pgsql-general по дате отправления:

Предыдущее
От: pinker
Дата:
Сообщение: Re: [GENERAL] delete a file everytime pg server starts/crashes
Следующее
От: "Igal @ Lucee.org"
Дата:
Сообщение: [GENERAL] Functions and Parentheses