Обсуждение: Blocked updates and background writer performance

Поиск
Список
Период
Сортировка

Blocked updates and background writer performance

От
Cory Tucker
Дата:
PG 9.4.4 (RDS)

I'm experiencing an issue when trying to update many rows in a single table (one row at a time, but parallelized across ~12 connections).  The issue we see is that the writes will periodically be blocked for a duration of several minutes and then pick back up.  After digging through our monitoring stack, I was able to uncover these stats which seem to allude to it being a background writer performance problem:

bg_writer.png
(apologies for the image)

Our settings for the background writer are pretty standard OOB (I threw in some others that I thought might be helpful, too):

          name           | setting | unit
-------------------------+---------+------
 bgwriter_delay          | 200     | ms
 bgwriter_lru_maxpages   | 100     |
 bgwriter_lru_multiplier | 2       |
 maintenance_work_mem    | 65536   | kB
 max_worker_processes    | 8       |
 work_mem                | 32768   | kB


The table that is being written to contains a jsonb column with a GIN index:

            Table "public.ced"
    Column     |           Type           | Modifiers
---------------+--------------------------+-----------
 id            | bigint                   | not null
 created_at    | timestamp with time zone |
 modified_at   | timestamp with time zone |
 bean_version  | bigint                   | default 0
 account_id    | bigint                   | not null
 data          | jsonb                    | not null
Indexes:
    "ced_pkey" PRIMARY KEY, btree (id)
    "ced_data" gin (data jsonb_path_ops)
    "partition_key_idx" btree (account_id, id)


It seems to me that the background writer just can't keep up with the amount of writes that I am trying to do and freezes all the updates. What are my options to improve the background writer performance here?

thanks
--Cory


Вложения

Re: Blocked updates and background writer performance

От
Dorian Hoxha
Дата:
Maybe you're not doing this.... but:
Using "data" json(b)/hstore column for all/most/many fields is an antipattern. Use 'data' ONLY for columns that you know will be dynamic.
This way you'll write less data into static-columns (no key-names overhead and better types) --> less data to disk etc (selects will also be faster).


On Tue, Jan 12, 2016 at 7:25 PM, Cory Tucker <cory.tucker@gmail.com> wrote:
PG 9.4.4 (RDS)

I'm experiencing an issue when trying to update many rows in a single table (one row at a time, but parallelized across ~12 connections).  The issue we see is that the writes will periodically be blocked for a duration of several minutes and then pick back up.  After digging through our monitoring stack, I was able to uncover these stats which seem to allude to it being a background writer performance problem:

bg_writer.png
(apologies for the image)

Our settings for the background writer are pretty standard OOB (I threw in some others that I thought might be helpful, too):

          name           | setting | unit
-------------------------+---------+------
 bgwriter_delay          | 200     | ms
 bgwriter_lru_maxpages   | 100     |
 bgwriter_lru_multiplier | 2       |
 maintenance_work_mem    | 65536   | kB
 max_worker_processes    | 8       |
 work_mem                | 32768   | kB


The table that is being written to contains a jsonb column with a GIN index:

            Table "public.ced"
    Column     |           Type           | Modifiers
---------------+--------------------------+-----------
 id            | bigint                   | not null
 created_at    | timestamp with time zone |
 modified_at   | timestamp with time zone |
 bean_version  | bigint                   | default 0
 account_id    | bigint                   | not null
 data          | jsonb                    | not null
Indexes:
    "ced_pkey" PRIMARY KEY, btree (id)
    "ced_data" gin (data jsonb_path_ops)
    "partition_key_idx" btree (account_id, id)


It seems to me that the background writer just can't keep up with the amount of writes that I am trying to do and freezes all the updates. What are my options to improve the background writer performance here?

thanks
--Cory



Вложения

Re: Blocked updates and background writer performance

От
Cory Tucker
Дата:
Thanks for the reply Dorian.  For the sake of argument, lets just say I'm definitely not doing what you mentioned. My question was not so much around modeling json storage as it is around tuning the background writer performance.

On Tue, Jan 12, 2016 at 2:14 PM Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
Maybe you're not doing this.... but:
Using "data" json(b)/hstore column for all/most/many fields is an antipattern. Use 'data' ONLY for columns that you know will be dynamic.
This way you'll write less data into static-columns (no key-names overhead and better types) --> less data to disk etc (selects will also be faster).


On Tue, Jan 12, 2016 at 7:25 PM, Cory Tucker <cory.tucker@gmail.com> wrote:
PG 9.4.4 (RDS)

I'm experiencing an issue when trying to update many rows in a single table (one row at a time, but parallelized across ~12 connections).  The issue we see is that the writes will periodically be blocked for a duration of several minutes and then pick back up.  After digging through our monitoring stack, I was able to uncover these stats which seem to allude to it being a background writer performance problem:

bg_writer.png
(apologies for the image)

Our settings for the background writer are pretty standard OOB (I threw in some others that I thought might be helpful, too):

          name           | setting | unit
-------------------------+---------+------
 bgwriter_delay          | 200     | ms
 bgwriter_lru_maxpages   | 100     |
 bgwriter_lru_multiplier | 2       |
 maintenance_work_mem    | 65536   | kB
 max_worker_processes    | 8       |
 work_mem                | 32768   | kB


The table that is being written to contains a jsonb column with a GIN index:

            Table "public.ced"
    Column     |           Type           | Modifiers
---------------+--------------------------+-----------
 id            | bigint                   | not null
 created_at    | timestamp with time zone |
 modified_at   | timestamp with time zone |
 bean_version  | bigint                   | default 0
 account_id    | bigint                   | not null
 data          | jsonb                    | not null
Indexes:
    "ced_pkey" PRIMARY KEY, btree (id)
    "ced_data" gin (data jsonb_path_ops)
    "partition_key_idx" btree (account_id, id)


It seems to me that the background writer just can't keep up with the amount of writes that I am trying to do and freezes all the updates. What are my options to improve the background writer performance here?

thanks
--Cory



Вложения

Re: Blocked updates and background writer performance

От
Jeff Janes
Дата:
On Tue, Jan 12, 2016 at 10:25 AM, Cory Tucker <cory.tucker@gmail.com> wrote:
PG 9.4.4 (RDS)

I'm experiencing an issue when trying to update many rows in a single table (one row at a time, but parallelized across ~12 connections).  The issue we see is that the writes will periodically be blocked for a duration of several minutes and then pick back up.  After digging through our monitoring stack, I was able to uncover these stats which seem to allude to it being a background writer performance problem:

I'm not familiar with your monitoring stack.  I assume bgwriter.sync_time refers to pg_stat_bgwriter.checkpoint_sync_time?  Also, most of the stats shown will increase monotonically until the stats are reset.  So it looks like our monitoring stack is either resetting stats frequently, or is implicitly doing a delta between each consecutive period for display purposes.  And what are the units?  bgwriter.buffers_checkpoint is presumably in buffers as the numerator, but over what period of time in the denominator?

Anyway, it looks to me like you have a checkpoint problem.  The checkpoint overwhelms your IO system.  The overwhelmed IO system then backs up into the bgwriter.  What you see in the bgwriter is just a symptom, not the cause.  The background writer is usually not very useful in recent versions of PostgreSQL, anyway.  But, the same IO problem that is clogging up the background writer is also clogging up either your buffer_backend, or your WAL writes/fsyncs.  And both of those will destroy your throughput.
 

bg_writer.png
(apologies for the image)

Our settings for the background writer are pretty standard OOB (I threw in some others that I thought might be helpful, too):

          name           | setting | unit
-------------------------+---------+------
 bgwriter_delay          | 200     | ms
 bgwriter_lru_maxpages   | 100     |
 bgwriter_lru_multiplier | 2       |
 maintenance_work_mem    | 65536   | kB
 max_worker_processes    | 8       |
 work_mem                | 32768   | kB


What are your checkpoint settings?

Also, you should turn on log_checkpoints.


It seems to me that the background writer just can't keep up with the amount of writes that I am trying to do and freezes all the updates. What are my options to improve the background writer performance here?

You probably just need more IO throughput in general.

Cheers,

Jeff
Вложения

Re: Blocked updates and background writer performance

От
Vick Khera
Дата:

On Wed, Jan 13, 2016 at 11:51 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
Anyway, it looks to me like you have a checkpoint problem.  The checkpoint overwhelms your IO system.  The overwhelmed IO system then backs up into the bgwriter.  What you see in the bgwriter is just a symptom, not the cause.  The background writer is usually not very useful in recent versions of PostgreSQL, anyway.  But, the same IO problem that is clogging up the background writer is also clogging up either your buffer_backend, or your WAL writes/fsyncs.  And both of those will destroy your throughput.

That was my intuition too. Not enough I/O available from the hardware for the workload requested.

As recommended, log your checkpoints and try tuning them to spread the load.

Re: Blocked updates and background writer performance

От
Cory Tucker
Дата:
On Wed, Jan 13, 2016 at 9:48 AM Vick Khera <vivek@khera.org> wrote:
That was my intuition too. Not enough I/O available from the hardware for the workload requested.

As recommended, log your checkpoints and try tuning them to spread the load.

Thanks guys, will turn on checkpoint logging and try to sniff this out further.