Re: postgresql latency & bgwriter not doing its job

Поиск
Список
Период
Сортировка
От Fabien COELHO
Тема Re: postgresql latency & bgwriter not doing its job
Дата
Msg-id alpine.DEB.2.10.1408260845110.4394@sto
обсуждение исходный текст
Ответ на Re: postgresql latency & bgwriter not doing its job  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: postgresql latency & bgwriter not doing its job  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hello Amit,

> I think another thing to know here is why exactly checkpoint
> storm is causing tps to drop so steeply.

Yep. Actually it is not strictly 0, but a "few" tps that I rounded to 0.
  progress: 63.0 s, 47.0 tps, lat 2.810 ms stddev 5.194, lag 0.354 ms  progress: 64.1 s, 11.9 tps, lat 81.481 ms stddev
218.026,lag 74.172 ms  progress: 65.2 s, 1.9 tps, lat 950.455 ms stddev 125.602, lag 1421.203 ms  progress: 66.1 s, 4.5
tps,lat 604.790 ms stddev 440.073, lag 2418.128 ms  progress: 67.2 s, 6.0 tps, lat 322.440 ms stddev 68.276, lag
3146.302ms  progress: 68.0 s, 2.4 tps, lat 759.509 ms stddev 62.141, lag 4229.239 ms  progress: 69.4 s, 3.6 tps, lat
440.335ms stddev 369.207, lag 4842.371 ms
 

Transactions are 4.8 seconds behind schedule at this point.

> One reason could be that backends might need to write more WAL due 
> Full_Page_Writes, another could be contention around buffer 
> content_lock. To dig more about the reason, the same tests can be tried 
> by making Full_Page_Writes = off and/or synchronous_commit = off to see 
> if WAL writes are causing tps to go down.

Given the small flow of updates, I do not think that there should be 
reason to get that big a write contention between WAL & checkpoint.

If tried with "full_page_write = off" for 500 seconds: same overall 
behavior, 8.5% of transactions are stuck (instead of 10%). However in 
details pg_stat_bgwriter is quite different:
  buffers_checkpoint = 13906  buffers_clean = 20748  buffers_backend = 472

That seems to suggest that bgwriter did some stuff for once, but that did 
not change much the result in the end. This would imply that my suggestion 
to make bgwriter write more would not fix the problem alone.

With "synchronous_commit = off", the situation is much improved, with only 
0.3% of transactions stuck. Not a surprise. However, I would not recommand 
that as a solution:-)

Currently, the only way I was able to "solve" the issue while still 
writing to disk is to send "CHECKPOINT" every 0.2s, as if I had set 
"checkpoint_timeout = 0.2s" (although this is not currently allowed).

> Similarly for checkpoints, use  checkpoint_completion_target to
> spread the checkpoint_writes as suggested by Jeff as well to see
> if that can mitigate the problem.

I had already tried, and retried after Jeff suggestion. This does not seem 
to mitigate anything, on the contrary.

-- 
Fabien.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Commitfest status
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_receivexlog and replication slots