Обсуждение: General performance/load issue

Поиск

Список

Период

Сортировка

General performance/load issue

От

Gaëtan Allart

Дата:

24 ноября 2011 г., 12:51:25

Hello everyone,

I'm having some troubles with a Postgresql server.
We're using PG has a database backend for a very big website (lots of data
and much traffic).

The issue : server suddenly (1H after restart) becomes slow (queries not
responding), load rises (>20 instead of 1), iowait rises (20 to 70%)

Version : 9.0.5
Server : Dual Xeon X5650 (24  cores total)
Memory : 48 GB
Disks : SSD


Top when overloaded :

21537 postgres  20   0 6420m 899m 892m D   22  1.9   0:01.86 postgres


21534 postgres  20   0 6554m 1.3g 1.1g D   14  2.7   0:01.72 postgres


21518 postgres  20   0 6419m 2.1g 2.1g D   10  4.6   0:02.10 postgres


21038 postgres  20   0 6716m 3.8g 3.5g D   10  8.0   0:45.46 postgres


21103 postgres  20   0 6571m 3.7g 3.6g D    9  7.8   0:32.19 postgres


21079 postgres  20   0 6575m 3.8g 3.7g D    8  8.1   0:36.39 postgres


21359 postgres  20   0 6563m 1.9g 1.8g D    4  4.1   0:09.10 postgres


21422 postgres  20   0 6563m 1.9g 1.7g D    4  3.9   0:08.34 postgres


19656 postgres  20   0 6727m 5.3g 5.0g D    3 11.1   1:58.25 postgres


21418 postgres  20   0 6685m 2.1g 1.9g D    2  4.5   0:12.42 postgres


21413 postgres  20   0 6693m 2.1g 1.8g D    2  4.4   0:11.06 postgres


21541 postgres  20   0 6421m 719m 711m D    1  1.5   0:00.48 postgres


14044 postgres  20   0 6418m 755m 750m D    1  1.6   0:04.71 postgres


21326 postgres  20   0 6685m 2.2g 2.0g D    1  4.7   0:15.82 postgres


21031 postgres  20   0 6688m 3.6g 3.4g D    1  7.7   0:44.18 postgres


21055 postgres  20   0 6575m 4.1g 3.9g D    1  8.6   0:39.11 postgres


21357 postgres  20   0 6693m 2.3g 2.0g D    1  4.9   0:13.74 postgres


Memory when overloaded :

 ~ # free -m
             total       used       free     shared    buffers     cached
Mem:         48339      47087       1251          0        248      38720
-/+ buffers/cache:       8118      40220
Swap:         8190          1       8189



Postgresql.conf :

max_connections = 50
shared_buffers = 12G
temp_buffers = 40MB
work_mem = 128MB
maintenance_work_mem = 256MB
effective_cache_size = 12GB
max_files_per_process = 8192
fsync = off
checkpoint_segments = 256
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9
seq_page_cost = 2.0
random_page_cost = 2.0


Did I do anything wrong? Any idea?

Regards,

Gaëtan

Re: General performance/load issue

От

"Tomas Vondra"

Дата:

24 ноября 2011 г., 13:09:54

On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
> Hello everyone,
>
> I'm having some troubles with a Postgresql server.
> We're using PG has a database backend for a very big website (lots of data
> and much traffic).
>
> The issue : server suddenly (1H after restart) becomes slow (queries not
> responding), load rises (>20 instead of 1), iowait rises (20 to 70%)
>
> Version : 9.0.5
> Server : Dual Xeon X5650 (24  cores total)
> Memory : 48 GB
> Disks : SSD
>
>
> Top when overloaded :

Top is not the most useful tool here, I guess. Use "iotop" (will show you
which processes are doing the I/O) and tools like vmstat / iostat.

> Postgresql.conf :
>
> max_connections = 50
> shared_buffers = 12G
> temp_buffers = 40MB
> work_mem = 128MB
> maintenance_work_mem = 256MB
> max_files_per_process = 8192
> checkpoint_segments = 256
> checkpoint_timeout = 30min
> checkpoint_completion_target = 0.9

Fine. Let's see the options that look suspicious.

> effective_cache_size = 12GB

Why have you set it like this? According to the "free" output you've
posted the cache has about 38G, so why just 12G here? There are possible
reasons, but I don't think this is the case.

> fsync = off

A really bad idea. I guess your data are worthless to you, right?

> seq_page_cost = 2.0
> random_page_cost = 2.0

Eh? First of all, what really matters is the relative value of those two
values, and it's good habit to leave seq_page_cost = 1.0 and change just
the other values.

Plus the random I/O is not as cheap as sequential I/O even on SSD drives,
so I't recommend something like this:

seq_page_cost = 1.0
random_page_cost = 2.0 (or maybe 1.5)

Anyway this needs to be tested properly - watch the performance and tune
if needed.

> Did I do anything wrong? Any idea?

Not sure. My guess is you're getting bitten by a checkpoint. We need to
know a few more details.

1) What is dirty_background_ratio / dirty_ratio (see /proc/sys/vm/ directory)

2) enable log_checkpoints in postgresql.conf and see how it correlates to
the bad performance

3) check which processes are responsible for the I/O (use iotop)

Tomas

Re: General performance/load issue

От

Gaëtan Allart

Дата:

24 ноября 2011 г., 13:28:20

Hi Thomas,

I will be using iotop ;)
Right now, most i/o come from "postgres: wal writer process".

-> effective_cache_size

Okay, I'll rise it to 32Gb.

-> fsync : changed to on ;)

-> seq_pages : i'll run tests. Thanks.

-> dirty :

cat /proc/sys/vm/dirty_ratio
20

cat /proc/sys/vm/dirty_background_ratio10



Thanks a lot Tomas. You're really helpful!

Gaëtan



Le 24/11/11 15:09, « Tomas Vondra » <tv@fuzzy.cz> a écrit :

>On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
>> Hello everyone,
>>
>> I'm having some troubles with a Postgresql server.
>> We're using PG has a database backend for a very big website (lots of
>>data
>> and much traffic).
>>
>> The issue : server suddenly (1H after restart) becomes slow (queries not
>> responding), load rises (>20 instead of 1), iowait rises (20 to 70%)
>>
>> Version : 9.0.5
>> Server : Dual Xeon X5650 (24  cores total)
>> Memory : 48 GB
>> Disks : SSD
>>
>>
>> Top when overloaded :
>
>Top is not the most useful tool here, I guess. Use "iotop" (will show you
>which processes are doing the I/O) and tools like vmstat / iostat.
>
>> Postgresql.conf :
>>
>> max_connections = 50
>> shared_buffers = 12G
>> temp_buffers = 40MB
>> work_mem = 128MB
>> maintenance_work_mem = 256MB
>> max_files_per_process = 8192
>> checkpoint_segments = 256
>> checkpoint_timeout = 30min
>> checkpoint_completion_target = 0.9
>
>Fine. Let's see the options that look suspicious.
>
>> effective_cache_size = 12GB
>
>Why have you set it like this? According to the "free" output you've
>posted the cache has about 38G, so why just 12G here? There are possible
>reasons, but I don't think this is the case.
>
>> fsync = off
>
>A really bad idea. I guess your data are worthless to you, right?
>
>> seq_page_cost = 2.0
>> random_page_cost = 2.0
>
>Eh? First of all, what really matters is the relative value of those two
>values, and it's good habit to leave seq_page_cost = 1.0 and change just
>the other values.
>
>Plus the random I/O is not as cheap as sequential I/O even on SSD drives,
>so I't recommend something like this:
>
>seq_page_cost = 1.0
>random_page_cost = 2.0 (or maybe 1.5)
>
>Anyway this needs to be tested properly - watch the performance and tune
>if needed.
>
>> Did I do anything wrong? Any idea?
>
>Not sure. My guess is you're getting bitten by a checkpoint. We need to
>know a few more details.
>
>1) What is dirty_background_ratio / dirty_ratio (see /proc/sys/vm/
>directory)
>
>2) enable log_checkpoints in postgresql.conf and see how it correlates to
>the bad performance
>
>3) check which processes are responsible for the I/O (use iotop)
>
>Tomas

Re: General performance/load issue

От

"Tomas Vondra"

Дата:

24 ноября 2011 г., 13:41:00

On 24 Listopad 2011, 15:27, Gaëtan Allart wrote:
> Hi Thomas,
>
> I will be using iotop ;)
> Right now, most i/o come from "postgres: wal writer process".

What do you mean by "most I/O" - how much data is it writing? Is there a
vacuum running at the same time? What other processes are doing I/O?

Post a few lines of "iostat -x 5" so we know what kind of I/O we're
dealing with.

> -> dirty :
>
> cat /proc/sys/vm/dirty_ratio 20
> cat /proc/sys/vm/dirty_background_ratio 10

This means the cache may contain up to 3.2GB of data before the system
starts to write them out on background, and about 6.4GB before the
processes can't use the write cache.

What about /proc/sys/vm/dirty_expire_centiseconds?

Is there something interesting in the postgresql.log? Autovacuum maybe?

Tomas

Re: General performance/load issue

От

Robert Treat

Дата:

24 ноября 2011 г., 14:40:13

On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
>> Hello everyone,
>>
>> I'm having some troubles with a Postgresql server.
>> We're using PG has a database backend for a very big website (lots of data
>> and much traffic).
>>
>> The issue : server suddenly (1H after restart) becomes slow (queries not
>> responding), load rises (>20 instead of 1), iowait rises (20 to 70%)
>>
>> Version : 9.0.5
>> Server : Dual Xeon X5650 (24  cores total)
>> Memory : 48 GB
>> Disks : SSD
>>
>>
>> Top when overloaded :
>
> Top is not the most useful tool here, I guess. Use "iotop" (will show you
> which processes are doing the I/O) and tools like vmstat / iostat.
>
>> Postgresql.conf :
>>
>> max_connections = 50
>> shared_buffers = 12G
>> temp_buffers = 40MB
>> work_mem = 128MB
>> maintenance_work_mem = 256MB
>> max_files_per_process = 8192
>> checkpoint_segments = 256
>> checkpoint_timeout = 30min
>> checkpoint_completion_target = 0.9
>
> Fine. Let's see the options that look suspicious.
>

I think you missed some suspicious settings... I'd recommend setting
shared buffers to 8gb, and I'd likely reduce checkpoint segements to
30 and set the checkpoint timeout back to 5 minutes. Everything about
the way this server is configured (including those vm settings) is
pushing it towards delaying the WAL/Buffer/Checkpoint as long as
possible, which matches with the idea of good performance initial
followed by a period of poor performance and heavy i/o.

On a side note, I'd guess your work_mem is probably too high. 50
(connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
which is 25% of total ram on the box. That doesn't necessarily mean
game over, but it seem like it wouldn't be that hard to get thrashing
being set up that way. YMMV.

Robert Treat
conjecture: xzilla.net
consulting: omniti.com

Re: General performance/load issue

От

"Tomas Vondra"

Дата:

24 ноября 2011 г., 15:03:02

On 24 Listopad 2011, 16:39, Robert Treat wrote:
> On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
>>> Postgresql.conf :
>>>
>>> max_connections = 50
>>> shared_buffers = 12G
>>> temp_buffers = 40MB
>>> work_mem = 128MB
>>> maintenance_work_mem = 256MB
>>> max_files_per_process = 8192
>>> checkpoint_segments = 256
>>> checkpoint_timeout = 30min
>>> checkpoint_completion_target = 0.9
>>
>> Fine. Let's see the options that look suspicious.
>>
>
> I think you missed some suspicious settings... I'd recommend setting
> shared buffers to 8gb, and I'd likely reduce checkpoint segements to
> 30 and set the checkpoint timeout back to 5 minutes. Everything about
> the way this server is configured (including those vm settings) is
> pushing it towards delaying the WAL/Buffer/Checkpoint as long as
> possible, which matches with the idea of good performance initial
> followed by a period of poor performance and heavy i/o.

Yes, checkpoints were my first thought too. OTOH the OP reported that most
of the I/O is caused by WAL writer - that's not exactly the part that does
the work during checkpoint. Plus the WAL may not be postponed, as it's
usually O_DIRECT and fsynced, right.

You're right that the writes are postponed, but I generally see that as a
good thing when combined with spread checkpoints. And even with those vm
settings (about 3.2GB for background writes), I wouldn't expect this
behaviour (because the page cache usually expires after 30 seconds). Say
you need 100% of the shared buffers is dirty and need to be written. You
have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
seconds expire there might be about 240MB before the pdflush starts to
write the data to the SSD. And that can surely handle more than 50MB/s. So
why the long delay? The question is what else is going on there.

But all this is just guessing - I want to see the log_checkpoint message,
iostat results etc.

> On a side note, I'd guess your work_mem is probably too high. 50
> (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
> which is 25% of total ram on the box. That doesn't necessarily mean
> game over, but it seem like it wouldn't be that hard to get thrashing
> being set up that way. YMMV.

This is one of the reasons why effective_cache_size should be lower than
32GB, probably ...

Tomas

Re: General performance/load issue

От

Gaëtan Allart

Дата:

24 ноября 2011 г., 15:12:41

Hi Robert,

Thanks for your help as well.
You're right about checkpoints, it's running pretty good at start then
encounter heavy i/os.

I've changed theses settings and also reduced work_mem a little and
reduced effective_cache_size btw.

LOG:  parameter "work_mem" changed to "96MB"
LOG:  parameter "effective_cache_size" changed to "24GB"
LOG:  parameter "checkpoint_segments" changed to "40"
LOG:  parameter "checkpoint_timeout" changed to "5min"

Apparently, it's been running fine since I made the first changes
recommenced by Tomas. Let's wait for a couple of hours again to confirm
this.



Gaëtan





Le 24/11/11 16:39, « Robert Treat » <rob@xzilla.net> a écrit :

>On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
>>> Hello everyone,
>>>
>>> I'm having some troubles with a Postgresql server.
>>> We're using PG has a database backend for a very big website (lots of
>>>data
>>> and much traffic).
>>>
>>> The issue : server suddenly (1H after restart) becomes slow (queries
>>>not
>>> responding), load rises (>20 instead of 1), iowait rises (20 to 70%)
>>>
>>> Version : 9.0.5
>>> Server : Dual Xeon X5650 (24  cores total)
>>> Memory : 48 GB
>>> Disks : SSD
>>>
>>>
>>> Top when overloaded :
>>
>> Top is not the most useful tool here, I guess. Use "iotop" (will show
>>you
>> which processes are doing the I/O) and tools like vmstat / iostat.
>>
>>> Postgresql.conf :
>>>
>>> max_connections = 50
>>> shared_buffers = 12G
>>> temp_buffers = 40MB
>>> work_mem = 128MB
>>> maintenance_work_mem = 256MB
>>> max_files_per_process = 8192
>>> checkpoint_segments = 256
>>> checkpoint_timeout = 30min
>>> checkpoint_completion_target = 0.9
>>
>> Fine. Let's see the options that look suspicious.
>>
>
>I think you missed some suspicious settings... I'd recommend setting
>shared buffers to 8gb, and I'd likely reduce checkpoint segements to
>30 and set the checkpoint timeout back to 5 minutes. Everything about
>the way this server is configured (including those vm settings) is
>pushing it towards delaying the WAL/Buffer/Checkpoint as long as
>possible, which matches with the idea of good performance initial
>followed by a period of poor performance and heavy i/o.
>
>On a side note, I'd guess your work_mem is probably too high. 50
>(connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
>which is 25% of total ram on the box. That doesn't necessarily mean
>game over, but it seem like it wouldn't be that hard to get thrashing
>being set up that way. YMMV.
>
>Robert Treat
>conjecture: xzilla.net
>consulting: omniti.com

Re: General performance/load issue

От

Gaëtan Allart

Дата:

24 ноября 2011 г., 15:14:15

Tomas,

I've enabled logging of checkpoints.
I'm waiting for the next i/o crisisŠ

Gaëtan





Le 24/11/11 17:02, « Tomas Vondra » <tv@fuzzy.cz> a écrit :

>On 24 Listopad 2011, 16:39, Robert Treat wrote:
>> On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
>>>> Postgresql.conf :
>>>>
>>>> max_connections = 50
>>>> shared_buffers = 12G
>>>> temp_buffers = 40MB
>>>> work_mem = 128MB
>>>> maintenance_work_mem = 256MB
>>>> max_files_per_process = 8192
>>>> checkpoint_segments = 256
>>>> checkpoint_timeout = 30min
>>>> checkpoint_completion_target = 0.9
>>>
>>> Fine. Let's see the options that look suspicious.
>>>
>>
>> I think you missed some suspicious settings... I'd recommend setting
>> shared buffers to 8gb, and I'd likely reduce checkpoint segements to
>> 30 and set the checkpoint timeout back to 5 minutes. Everything about
>> the way this server is configured (including those vm settings) is
>> pushing it towards delaying the WAL/Buffer/Checkpoint as long as
>> possible, which matches with the idea of good performance initial
>> followed by a period of poor performance and heavy i/o.
>
>Yes, checkpoints were my first thought too. OTOH the OP reported that most
>of the I/O is caused by WAL writer - that's not exactly the part that does
>the work during checkpoint. Plus the WAL may not be postponed, as it's
>usually O_DIRECT and fsynced, right.
>
>You're right that the writes are postponed, but I generally see that as a
>good thing when combined with spread checkpoints. And even with those vm
>settings (about 3.2GB for background writes), I wouldn't expect this
>behaviour (because the page cache usually expires after 30 seconds). Say
>you need 100% of the shared buffers is dirty and need to be written. You
>have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
>seconds expire there might be about 240MB before the pdflush starts to
>write the data to the SSD. And that can surely handle more than 50MB/s. So
>why the long delay? The question is what else is going on there.
>
>But all this is just guessing - I want to see the log_checkpoint message,
>iostat results etc.
>
>> On a side note, I'd guess your work_mem is probably too high. 50
>> (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
>> which is 25% of total ram on the box. That doesn't necessarily mean
>> game over, but it seem like it wouldn't be that hard to get thrashing
>> being set up that way. YMMV.
>
>This is one of the reasons why effective_cache_size should be lower than
>32GB, probably ...
>
>Tomas

Re: General performance/load issue

От

Cédric Villemain

Дата:

24 ноября 2011 г., 18:15:28

Le 24 novembre 2011 17:02, Tomas Vondra <tv@fuzzy.cz> a écrit :
> On 24 Listopad 2011, 16:39, Robert Treat wrote:
>> On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
>>>> Postgresql.conf :
>>>>
>>>> max_connections = 50
>>>> shared_buffers = 12G
>>>> temp_buffers = 40MB
>>>> work_mem = 128MB
>>>> maintenance_work_mem = 256MB
>>>> max_files_per_process = 8192
>>>> checkpoint_segments = 256
>>>> checkpoint_timeout = 30min
>>>> checkpoint_completion_target = 0.9
>>>
>>> Fine. Let's see the options that look suspicious.
>>>
>>
>> I think you missed some suspicious settings... I'd recommend setting
>> shared buffers to 8gb, and I'd likely reduce checkpoint segements to
>> 30 and set the checkpoint timeout back to 5 minutes. Everything about
>> the way this server is configured (including those vm settings) is
>> pushing it towards delaying the WAL/Buffer/Checkpoint as long as
>> possible, which matches with the idea of good performance initial
>> followed by a period of poor performance and heavy i/o.
>
> Yes, checkpoints were my first thought too. OTOH the OP reported that most
> of the I/O is caused by WAL writer - that's not exactly the part that does
> the work during checkpoint. Plus the WAL may not be postponed, as it's
> usually O_DIRECT and fsynced, right.
>
> You're right that the writes are postponed, but I generally see that as a
> good thing when combined with spread checkpoints. And even with those vm
> settings (about 3.2GB for background writes), I wouldn't expect this
> behaviour (because the page cache usually expires after 30 seconds). Say
> you need 100% of the shared buffers is dirty and need to be written. You
> have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
> seconds expire there might be about 240MB before the pdflush starts to
> write the data to the SSD. And that can surely handle more than 50MB/s. So
> why the long delay? The question is what else is going on there.
>
> But all this is just guessing - I want to see the log_checkpoint message,
> iostat results etc.
>
>> On a side note, I'd guess your work_mem is probably too high. 50
>> (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
>> which is 25% of total ram on the box. That doesn't necessarily mean
>> game over, but it seem like it wouldn't be that hard to get thrashing
>> being set up that way. YMMV.
>
> This is one of the reasons why effective_cache_size should be lower than
> 32GB, probably ...

according to 'free' output, 38GB is what is here right now.
effective_cache_size is just informative, so you can put it to 1TB
without memory issue.
And, it is OS cache+PG cache.

There is not enougth information yet to be sure on what's happening.
log_checkpoint output will help for sure.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: General performance/load issue

От

Gaëtan Allart

Дата:

24 ноября 2011 г., 18:38:59

Finally, it "crashed" againŠ :-(

Here's the output of iotop while databased was inaccessible :

32361 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(34847) idle
32244 be/4 postgres  163.48 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(34660) SELECT
32045 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(33765) SELECT
32158 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(34112) SELECT
32242 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(34632) SELECT
32372 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(38858) idle in transaction
32231 be/4 postgres   15.57 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(38602) SELECT
28811 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(40594) SELECT
32190 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
mydb host(38497) SELECT



And the latest logs :

LOG:  checkpoint complete: wrote 3192 buffers (0.2%); 0 transaction log
file(s) added, 1 removed, 0 recycled; write=262.636 s, sync=135.456 s,
total=416.630 s
LOG:  checkpoint complete: wrote 716 buffers (0.0%); 0 transaction log
file(s) added, 1 removed, 0 recycled; write=165.497 s, sync=17.111 s,
total=193.199 s


WARNING:  pgstat wait timeout
STATEMENT:  SELECT 'commit',sum(pg_stat_get_db_xact_commit(oid)) FROM
pg_database WHERE datname=$1
                      UNION ALL
                     SELECT
'rollback',sum(pg_stat_get_db_xact_rollback(oid)) FROM pg_database WHERE
datname=$2

LOG:  checkpoint complete: wrote 699 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=203.023 s, sync=119.037 s,
total=335.012 s
LOG:  checkpoint complete: wrote 348 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=120.412 s, sync=0.020 s,
total=120.435 s





Does this help?

Gaëtan

Le 24/11/11 17:02, « Tomas Vondra » <tv@fuzzy.cz> a écrit :

>On 24 Listopad 2011, 16:39, Robert Treat wrote:
>> On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
>>>> Postgresql.conf :
>>>>
>>>> max_connections = 50
>>>> shared_buffers = 12G
>>>> temp_buffers = 40MB
>>>> work_mem = 128MB
>>>> maintenance_work_mem = 256MB
>>>> max_files_per_process = 8192
>>>> checkpoint_segments = 256
>>>> checkpoint_timeout = 30min
>>>> checkpoint_completion_target = 0.9
>>>
>>> Fine. Let's see the options that look suspicious.
>>>
>>
>> I think you missed some suspicious settings... I'd recommend setting
>> shared buffers to 8gb, and I'd likely reduce checkpoint segements to
>> 30 and set the checkpoint timeout back to 5 minutes. Everything about
>> the way this server is configured (including those vm settings) is
>> pushing it towards delaying the WAL/Buffer/Checkpoint as long as
>> possible, which matches with the idea of good performance initial
>> followed by a period of poor performance and heavy i/o.
>
>Yes, checkpoints were my first thought too. OTOH the OP reported that most
>of the I/O is caused by WAL writer - that's not exactly the part that does
>the work during checkpoint. Plus the WAL may not be postponed, as it's
>usually O_DIRECT and fsynced, right.
>
>You're right that the writes are postponed, but I generally see that as a
>good thing when combined with spread checkpoints. And even with those vm
>settings (about 3.2GB for background writes), I wouldn't expect this
>behaviour (because the page cache usually expires after 30 seconds). Say
>you need 100% of the shared buffers is dirty and need to be written. You
>have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
>seconds expire there might be about 240MB before the pdflush starts to
>write the data to the SSD. And that can surely handle more than 50MB/s. So
>why the long delay? The question is what else is going on there.
>
>But all this is just guessing - I want to see the log_checkpoint message,
>iostat results etc.
>
>> On a side note, I'd guess your work_mem is probably too high. 50
>> (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
>> which is 25% of total ram on the box. That doesn't necessarily mean
>> game over, but it seem like it wouldn't be that hard to get thrashing
>> being set up that way. YMMV.
>
>This is one of the reasons why effective_cache_size should be lower than
>32GB, probably ...
>
>Tomas

Re: General performance/load issue

От

Cédric Villemain

Дата:

24 ноября 2011 г., 21:19:59

Le 24 novembre 2011 20:38, Gaëtan Allart <gaetan@nexylan.com> a écrit :
> Finally, it "crashed" againŠ :-(
>
> Here's the output of iotop while databased was inaccessible :
>
> 32361 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(34847) idle
> 32244 be/4 postgres  163.48 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(34660) SELECT
> 32045 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(33765) SELECT
> 32158 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(34112) SELECT
> 32242 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(34632) SELECT
> 32372 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(38858) idle in transaction
> 32231 be/4 postgres   15.57 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(38602) SELECT
> 28811 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(40594) SELECT
> 32190 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres: mydb
> mydb host(38497) SELECT
>
>
>
> And the latest logs :
>
> LOG:  checkpoint complete: wrote 3192 buffers (0.2%); 0 transaction log
> file(s) added, 1 removed, 0 recycled; write=262.636 s, sync=135.456 s,
> total=416.630 s
> LOG:  checkpoint complete: wrote 716 buffers (0.0%); 0 transaction log
> file(s) added, 1 removed, 0 recycled; write=165.497 s, sync=17.111 s,
> total=193.199 s
>
>
> WARNING:  pgstat wait timeout
> STATEMENT:  SELECT 'commit',sum(pg_stat_get_db_xact_commit(oid)) FROM
> pg_database WHERE datname=$1
>                      UNION ALL
>                     SELECT
> 'rollback',sum(pg_stat_get_db_xact_rollback(oid)) FROM pg_database WHERE
> datname=$2
>
> LOG:  checkpoint complete: wrote 699 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=203.023 s, sync=119.037 s,
> total=335.012 s
> LOG:  checkpoint complete: wrote 348 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=120.412 s, sync=0.020 s,
> total=120.435 s
>
>
>
>
>
> Does this help?

yes.
It seem you have an issue with your checkpoint syncing time, it is
fixed in 9.1 and backported in 9.0 here :
http://projects.2ndquadrant.com/backports

It is possible you have other problems that explains the issue you
have. An immediate solution before trying a patch is to reduce your
shared_buffer setting to something very low, like 1GB.


>
> Gaëtan
>
> Le 24/11/11 17:02, « Tomas Vondra » <tv@fuzzy.cz> a écrit :
>
>>On 24 Listopad 2011, 16:39, Robert Treat wrote:
>>> On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>>> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote:
>>>>> Postgresql.conf :
>>>>>
>>>>> max_connections = 50
>>>>> shared_buffers = 12G
>>>>> temp_buffers = 40MB
>>>>> work_mem = 128MB
>>>>> maintenance_work_mem = 256MB
>>>>> max_files_per_process = 8192
>>>>> checkpoint_segments = 256
>>>>> checkpoint_timeout = 30min
>>>>> checkpoint_completion_target = 0.9
>>>>
>>>> Fine. Let's see the options that look suspicious.
>>>>
>>>
>>> I think you missed some suspicious settings... I'd recommend setting
>>> shared buffers to 8gb, and I'd likely reduce checkpoint segements to
>>> 30 and set the checkpoint timeout back to 5 minutes. Everything about
>>> the way this server is configured (including those vm settings) is
>>> pushing it towards delaying the WAL/Buffer/Checkpoint as long as
>>> possible, which matches with the idea of good performance initial
>>> followed by a period of poor performance and heavy i/o.
>>
>>Yes, checkpoints were my first thought too. OTOH the OP reported that most
>>of the I/O is caused by WAL writer - that's not exactly the part that does
>>the work during checkpoint. Plus the WAL may not be postponed, as it's
>>usually O_DIRECT and fsynced, right.
>>
>>You're right that the writes are postponed, but I generally see that as a
>>good thing when combined with spread checkpoints. And even with those vm
>>settings (about 3.2GB for background writes), I wouldn't expect this
>>behaviour (because the page cache usually expires after 30 seconds). Say
>>you need 100% of the shared buffers is dirty and need to be written. You
>>have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30
>>seconds expire there might be about 240MB before the pdflush starts to
>>write the data to the SSD. And that can surely handle more than 50MB/s. So
>>why the long delay? The question is what else is going on there.
>>
>>But all this is just guessing - I want to see the log_checkpoint message,
>>iostat results etc.
>>
>>> On a side note, I'd guess your work_mem is probably too high. 50
>>> (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM,
>>> which is 25% of total ram on the box. That doesn't necessarily mean
>>> game over, but it seem like it wouldn't be that hard to get thrashing
>>> being set up that way. YMMV.
>>
>>This is one of the reasons why effective_cache_size should be lower than
>>32GB, probably ...
>>
>>Tomas
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: General performance/load issue

От

"Tomas Vondra"

Дата:

25 ноября 2011 г., 09:25:37

On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
> Le 24 novembre 2011 20:38, Gaëtan Allart <gaetan@nexylan.com> a écrit :
>> Finally, it "crashed" againŠ :-(
>>
>> Here's the output of iotop while databased was inaccessible :
>>
>> 32361 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(34847) idle
>> 32244 be/4 postgres  163.48 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(34660) SELECT
>> 32045 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(33765) SELECT
>> 32158 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(34112) SELECT
>> 32242 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(34632) SELECT
>> 32372 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(38858) idle in transaction
>> 32231 be/4 postgres   15.57 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(38602) SELECT
>> 28811 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(40594) SELECT
>> 32190 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(38497) SELECT

This iotop output is rather incomplete - it does not show which of the
processes actually did the I/O. The fact that a process does 99% of the
I/O and yet performs no reads or writes is a bit strange.

Use "iotop -o" and post the result.

>> Does this help?
>
> yes.
> It seem you have an issue with your checkpoint syncing time, it is
> fixed in 9.1 and backported in 9.0 here :
> http://projects.2ndquadrant.com/backports

People generally don't want to apply backports on their own, especially
when it's a production server and when it's unclear it actually fixes the
issue they have. I'm not sure about that.

> It is possible you have other problems that explains the issue you
> have. An immediate solution before trying a patch is to reduce your
> shared_buffer setting to something very low, like 1GB.

Well, using low shared_buffers was used especially before 8.3, when the
spread checkpoints were not available. It prevents the I/O overload when
the database suddenly decides to write all of the dirty buffers. But he's
on 9.0 (so he already has spread checkpoints).

Plus the number of buffers he's writing is negligible - usually about 700
buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
problem for the SSD he's using.

Tomas

Re: General performance/load issue

От

Cédric Villemain

Дата:

25 ноября 2011 г., 10:43:48

Le 25 novembre 2011 11:25, Tomas Vondra <tv@fuzzy.cz> a écrit :
> On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
>> Le 24 novembre 2011 20:38, Gaëtan Allart <gaetan@nexylan.com> a écrit :
>>> Finally, it "crashed" againŠ :-(
>>>
>>> Here's the output of iotop while databased was inaccessible :
>>>
>>> 32361 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(34847) idle
>>> 32244 be/4 postgres  163.48 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(34660) SELECT
>>> 32045 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(33765) SELECT
>>> 32158 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(34112) SELECT
>>> 32242 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(34632) SELECT
>>> 32372 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(38858) idle in transaction
>>> 32231 be/4 postgres   15.57 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(38602) SELECT
>>> 28811 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(40594) SELECT
>>> 32190 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> mydb
>>> mydb host(38497) SELECT
>
> This iotop output is rather incomplete - it does not show which of the
> processes actually did the I/O. The fact that a process does 99% of the
> I/O and yet performs no reads or writes is a bit strange.
>
> Use "iotop -o" and post the result.
>
>>> Does this help?
>>
>> yes.
>> It seem you have an issue with your checkpoint syncing time, it is
>> fixed in 9.1 and backported in 9.0 here :
>> http://projects.2ndquadrant.com/backports
>
> People generally don't want to apply backports on their own, especially
> when it's a production server and when it's unclear it actually fixes the
> issue they have. I'm not sure about that.

I agree that most people don't want to do that themselves, but if it
happens to be the solution they can proceed or ask someone to do it.
People want to see their production system back to a normal situation,
here the limited information are not enought to be sure, but the
checkpoint sync time are clear: sync time are not correct.
It is very probable that compacting the fsync will help, but it is not
sure it is required yet.

>
>> It is possible you have other problems that explains the issue you
>> have. An immediate solution before trying a patch is to reduce your
>> shared_buffer setting to something very low, like 1GB.
>
> Well, using low shared_buffers was used especially before 8.3, when the
> spread checkpoints were not available. It prevents the I/O overload when
> the database suddenly decides to write all of the dirty buffers. But he's
> on 9.0 (so he already has spread checkpoints).

It is a different animal here.

>
> Plus the number of buffers he's writing is negligible - usually about 700
> buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
> problem for the SSD he's using.

See the blog entry from Greg Smith:
 http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.html

And the slides of his talk at pgconf2011:
 http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon2011.pdf

I was just pointing that there are known issues in this area, with
known solutions.

Getting more information on vacuum activity, bgwriter activity should help too.


Gaëtan, do you still have critical performance issue, or is it back to
normal/loaded (but not overloaded) situation ?

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: General performance/load issue

От

"Tomas Vondra"

Дата:

25 ноября 2011 г., 12:12:53

On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
> Le 25 novembre 2011 11:25, Tomas Vondra <tv@fuzzy.cz> a écrit :
>> On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
>>>
>>> It seem you have an issue with your checkpoint syncing time, it is
>>> fixed in 9.1 and backported in 9.0 here :
>>> http://projects.2ndquadrant.com/backports
>>
>> People generally don't want to apply backports on their own, especially
>> when it's a production server and when it's unclear it actually fixes
>> the
>> issue they have. I'm not sure about that.
>
> I agree that most people don't want to do that themselves, but if it
> happens to be the solution they can proceed or ask someone to do it.
> People want to see their production system back to a normal situation,
> here the limited information are not enought to be sure, but the
> checkpoint sync time are clear: sync time are not correct.
> It is very probable that compacting the fsync will help, but it is not
> sure it is required yet.

Yes, the sync times are quite crazy. Especially given the tiny number of
buffers to be written and the fact that the SSD should handle random I/O
quite well.

>>> It is possible you have other problems that explains the issue you
>>> have. An immediate solution before trying a patch is to reduce your
>>> shared_buffer setting to something very low, like 1GB.
>>
>> Well, using low shared_buffers was used especially before 8.3, when the
>> spread checkpoints were not available. It prevents the I/O overload when
>> the database suddenly decides to write all of the dirty buffers. But
>> he's
>> on 9.0 (so he already has spread checkpoints).
>
> It is a different animal here.
>
>>
>> Plus the number of buffers he's writing is negligible - usually about
>> 700
>> buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
>> problem for the SSD he's using.
>
> See the blog entry from Greg Smith:
>  http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.html
>
> And the slides of his talk at pgconf2011:
>  http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon2011.pdf
>
> I was just pointing that there are known issues in this area, with
> known solutions.

Thanks for the links, interesting stuff. Still, my impression is that the
SSD is stressed by something else, and the high fsync values during a
checkpoint are merely a symptom. So fixing a checkpoint (using the
backpatch) won't actually fix the issue. But I'm just guessing here.

> Getting more information on vacuum activity, bgwriter activity should help
> too.

Yes, that'd be nice. Gaëtan, can you post bgwriter-related options from
postgresql.conf and two snapshosts from pg_stat_bgwriter (say 5 minutes
apart, collected when the db is slow)? A complete 'iotop -o' output would
be nice too.

BTW what filesystem are we talking about? What mount options have you used?

What about the /proc/sys/vm/dirty_expire_centiseconds?

Tomas

Re: General performance/load issue

От

Gaëtan Allart

Дата:

25 ноября 2011 г., 21:48:02

Hello Tomas and Cédric,

Right now, the server is not all right. Load is above 30 and queries are
slow like hell.


Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
queries).

Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND


27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
database database 176.31.228.6(38816) SELECT
20226 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 99.99 % postgres:
database database 176.31.228.6(34166) SELECT
26950 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 82.14 % postgres:
database database 46.105.104.205(40820) SELECT
23160 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 81.14 % postgres:
database database 46.105.104.205(58091) SELECT
29184 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 79.17 % postgres:
database database 46.105.104.205(51047) SELECT
27271 be/4 postgres   23.46 K/s  234.58 K/s  0.00 % 77.15 % postgres:
database database 46.105.104.205(42315) SELECT
28224 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 74.09 % postgres:
database database 46.105.104.205(49871) SELECT
27450 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 70.12 % postgres:
database database 46.105.104.205(49096) SELECT
29206 be/4 postgres  121.20 K/s    0.00 B/s  0.00 % 62.99 % postgres:
database database 176.31.228.6(39787) SELECT
 8202 be/4 postgres  129.02 K/s    0.00 B/s  0.00 % 60.59 % postgres:
database database 91.121.89.14(57291) SELECT
29209 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 55.75 % postgres:
database database 176.31.228.6(39807) idle
27046 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 52.35 % postgres:
database database 46.105.104.205(41285) SELECT
29217 be/4 postgres   54.73 K/s    0.00 B/s  0.00 % 50.18 % postgres:
database database 46.105.104.205(51178) SELECT
 3249 be/4 root        3.91 K/s  320.59 K/s  0.00 % 40.31 % [kjournald]
28918 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 34.83 % postgres:
database database 46.105.104.205(50164) SELECT
26856 be/4 postgres    7.82 K/s  234.58 K/s  0.00 % 34.30 % postgres:
database database 46.105.104.205(40589) SELECT
29205 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 34.17 % postgres:
database database 176.31.228.6(39784) SELECT
 4175 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.84 % [flush-8:0]
28905 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.02 % postgres:
database database 46.105.104.205(50125) SELECT
28919 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.01 % postgres:
database database 46.105.104.205(50167) SELECT
19807 be/4 postgres    3.91 K/s    0.00 B/s  0.00 %  0.00 % postgres:
database database 176.31.228.6(33598) SELECT
28703 be/4 postgres    0.00 B/s   86.01 K/s  0.00 %  0.00 % postgres:
database database 176.31.228.6(39260) SELECT
29183 be/4 postgres    7.82 K/s    0.00 B/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(51044) SELECT


Here is dirty_expire_centisecs :

cat /proc/sys/vm/dirty_expire_centisecs
3000


Bgwriter configuration is default :

#bgwriter_delay = 200ms   # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers scanned/round




Is there anything I can provide to help you ?



Thanks again,

Gaëtan



Le 25/11/11 14:12, « Tomas Vondra » <tv@fuzzy.cz> a écrit :

>On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
>> Le 25 novembre 2011 11:25, Tomas Vondra <tv@fuzzy.cz> a écrit :
>>> On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
>>>>
>>>> It seem you have an issue with your checkpoint syncing time, it is
>>>> fixed in 9.1 and backported in 9.0 here :
>>>> http://projects.2ndquadrant.com/backports
>>>
>>> People generally don't want to apply backports on their own, especially
>>> when it's a production server and when it's unclear it actually fixes
>>> the
>>> issue they have. I'm not sure about that.
>>
>> I agree that most people don't want to do that themselves, but if it
>> happens to be the solution they can proceed or ask someone to do it.
>> People want to see their production system back to a normal situation,
>> here the limited information are not enought to be sure, but the
>> checkpoint sync time are clear: sync time are not correct.
>> It is very probable that compacting the fsync will help, but it is not
>> sure it is required yet.
>
>Yes, the sync times are quite crazy. Especially given the tiny number of
>buffers to be written and the fact that the SSD should handle random I/O
>quite well.
>
>>>> It is possible you have other problems that explains the issue you
>>>> have. An immediate solution before trying a patch is to reduce your
>>>> shared_buffer setting to something very low, like 1GB.
>>>
>>> Well, using low shared_buffers was used especially before 8.3, when the
>>> spread checkpoints were not available. It prevents the I/O overload
>>>when
>>> the database suddenly decides to write all of the dirty buffers. But
>>> he's
>>> on 9.0 (so he already has spread checkpoints).
>>
>> It is a different animal here.
>>
>>>
>>> Plus the number of buffers he's writing is negligible - usually about
>>> 700
>>> buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
>>> problem for the SSD he's using.
>>
>> See the blog entry from Greg Smith:
>>
>>http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.htm
>>l
>>
>> And the slides of his talk at pgconf2011:
>>
>>http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon
>>2011.pdf
>>
>> I was just pointing that there are known issues in this area, with
>> known solutions.
>
>Thanks for the links, interesting stuff. Still, my impression is that the
>SSD is stressed by something else, and the high fsync values during a
>checkpoint are merely a symptom. So fixing a checkpoint (using the
>backpatch) won't actually fix the issue. But I'm just guessing here.
>
>> Getting more information on vacuum activity, bgwriter activity should
>>help
>> too.
>
>Yes, that'd be nice. Gaëtan, can you post bgwriter-related options from
>postgresql.conf and two snapshosts from pg_stat_bgwriter (say 5 minutes
>apart, collected when the db is slow)? A complete 'iotop -o' output would
>be nice too.
>
>BTW what filesystem are we talking about? What mount options have you
>used?
>
>What about the /proc/sys/vm/dirty_expire_centiseconds?
>
>Tomas

Re: General performance/load issue

От

Cédric Villemain

Дата:

25 ноября 2011 г., 22:18:11

Le 25 novembre 2011 23:47, Gaëtan Allart <gaetan@nexylan.com> a écrit :
> Hello Tomas and Cédric,
>
> Right now, the server is not all right. Load is above 30 and queries are
> slow like hell.
>
>
> Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
> queries).
>
> Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
>  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
>
>
> 27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
> database database 176.31.228.6(38816) SELECT
> 20226 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 99.99 % postgres:
> database database 176.31.228.6(34166) SELECT
> 26950 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 82.14 % postgres:
> database database 46.105.104.205(40820) SELECT
> 23160 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 81.14 % postgres:
> database database 46.105.104.205(58091) SELECT
> 29184 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 79.17 % postgres:
> database database 46.105.104.205(51047) SELECT
> 27271 be/4 postgres   23.46 K/s  234.58 K/s  0.00 % 77.15 % postgres:
> database database 46.105.104.205(42315) SELECT
> 28224 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 74.09 % postgres:
> database database 46.105.104.205(49871) SELECT
> 27450 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 70.12 % postgres:
> database database 46.105.104.205(49096) SELECT
> 29206 be/4 postgres  121.20 K/s    0.00 B/s  0.00 % 62.99 % postgres:
> database database 176.31.228.6(39787) SELECT
>  8202 be/4 postgres  129.02 K/s    0.00 B/s  0.00 % 60.59 % postgres:
> database database 91.121.89.14(57291) SELECT
> 29209 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 55.75 % postgres:
> database database 176.31.228.6(39807) idle
> 27046 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 52.35 % postgres:
> database database 46.105.104.205(41285) SELECT
> 29217 be/4 postgres   54.73 K/s    0.00 B/s  0.00 % 50.18 % postgres:
> database database 46.105.104.205(51178) SELECT
>  3249 be/4 root        3.91 K/s  320.59 K/s  0.00 % 40.31 % [kjournald]
> 28918 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 34.83 % postgres:
> database database 46.105.104.205(50164) SELECT
> 26856 be/4 postgres    7.82 K/s  234.58 K/s  0.00 % 34.30 % postgres:
> database database 46.105.104.205(40589) SELECT
> 29205 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 34.17 % postgres:
> database database 176.31.228.6(39784) SELECT
>  4175 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.84 % [flush-8:0]
> 28905 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.02 % postgres:
> database database 46.105.104.205(50125) SELECT
> 28919 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.01 % postgres:
> database database 46.105.104.205(50167) SELECT
> 19807 be/4 postgres    3.91 K/s    0.00 B/s  0.00 %  0.00 % postgres:
> database database 176.31.228.6(33598) SELECT
> 28703 be/4 postgres    0.00 B/s   86.01 K/s  0.00 %  0.00 % postgres:
> database database 176.31.228.6(39260) SELECT
> 29183 be/4 postgres    7.82 K/s    0.00 B/s  0.00 %  0.00 % postgres:
> database database 46.105.104.205(51044) SELECT
>
>
> Here is dirty_expire_centisecs :
>
> cat /proc/sys/vm/dirty_expire_centisecs
> 3000
>
>
> Bgwriter configuration is default :
>
> #bgwriter_delay = 200ms   # 10-10000ms between rounds
> #bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
> #bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers scanned/round
>
>
>
>
> Is there anything I can provide to help you ?

the checkpoints logs and the output of pg_stat_bgwriter (as asked by Tomas).

It is probable that shared_buffers are too small for your workload
(expected), do you still have issue with checkpoint sync duration ?

You should increase shared_buffers (you can iterate) *and* try to not
have the long checkpoint sync. (keep shared_buffers <= 8GB, as already
mentioned upthread)

The output you provided does not show everything, and the select are
not looking to write that much. What process are writing ?

>
>
>
> Thanks again,
>
> Gaëtan
>
>
>
> Le 25/11/11 14:12, « Tomas Vondra » <tv@fuzzy.cz> a écrit :
>
>>On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
>>> Le 25 novembre 2011 11:25, Tomas Vondra <tv@fuzzy.cz> a écrit :
>>>> On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
>>>>>
>>>>> It seem you have an issue with your checkpoint syncing time, it is
>>>>> fixed in 9.1 and backported in 9.0 here :
>>>>> http://projects.2ndquadrant.com/backports
>>>>
>>>> People generally don't want to apply backports on their own, especially
>>>> when it's a production server and when it's unclear it actually fixes
>>>> the
>>>> issue they have. I'm not sure about that.
>>>
>>> I agree that most people don't want to do that themselves, but if it
>>> happens to be the solution they can proceed or ask someone to do it.
>>> People want to see their production system back to a normal situation,
>>> here the limited information are not enought to be sure, but the
>>> checkpoint sync time are clear: sync time are not correct.
>>> It is very probable that compacting the fsync will help, but it is not
>>> sure it is required yet.
>>
>>Yes, the sync times are quite crazy. Especially given the tiny number of
>>buffers to be written and the fact that the SSD should handle random I/O
>>quite well.
>>
>>>>> It is possible you have other problems that explains the issue you
>>>>> have. An immediate solution before trying a patch is to reduce your
>>>>> shared_buffer setting to something very low, like 1GB.
>>>>
>>>> Well, using low shared_buffers was used especially before 8.3, when the
>>>> spread checkpoints were not available. It prevents the I/O overload
>>>>when
>>>> the database suddenly decides to write all of the dirty buffers. But
>>>> he's
>>>> on 9.0 (so he already has spread checkpoints).
>>>
>>> It is a different animal here.
>>>
>>>>
>>>> Plus the number of buffers he's writing is negligible - usually about
>>>> 700
>>>> buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
>>>> problem for the SSD he's using.
>>>
>>> See the blog entry from Greg Smith:
>>>
>>>http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.htm
>>>l
>>>
>>> And the slides of his talk at pgconf2011:
>>>
>>>http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon
>>>2011.pdf
>>>
>>> I was just pointing that there are known issues in this area, with
>>> known solutions.
>>
>>Thanks for the links, interesting stuff. Still, my impression is that the
>>SSD is stressed by something else, and the high fsync values during a
>>checkpoint are merely a symptom. So fixing a checkpoint (using the
>>backpatch) won't actually fix the issue. But I'm just guessing here.
>>
>>> Getting more information on vacuum activity, bgwriter activity should
>>>help
>>> too.
>>
>>Yes, that'd be nice. Gaëtan, can you post bgwriter-related options from
>>postgresql.conf and two snapshosts from pg_stat_bgwriter (say 5 minutes
>>apart, collected when the db is slow)? A complete 'iotop -o' output would
>>be nice too.
>>
>>BTW what filesystem are we talking about? What mount options have you
>>used?
>>
>>What about the /proc/sys/vm/dirty_expire_centiseconds?
>>
>>Tomas
>
>



--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: General performance/load issue

От

Gaëtan Allart

Дата:

25 ноября 2011 г., 22:48:56

Here are the latest checkpoint logs :

LOG:  checkpoint complete: wrote 842 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=168.970 s, sync=0.005 s,
total=168.977 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 318 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 2 recycled; write=63.818 s, sync=0.006 s,
total=63.825 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 744 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=147.035 s, sync=0.006 s,
total=147.043 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 108 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 3 recycled; write=35.410 s, sync=14.921 s,
total=54.811 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 393 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=88.835 s, sync=43.210 s,
total=135.728 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 914 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=148.162 s, sync=14.249 s,
total=170.481 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 202 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=53.152 s, sync=0.004 s,
total=53.159 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 897 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=42.414 s, sync=1.175 s,
total=58.957 s
LOG:  checkpoint starting: shutdown immediate
LOG:  checkpoint complete: wrote 666 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=0.027 s, sync=1.600 s,
total=1.630 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 627 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=125.856 s, sync=0.006 s,
total=125.864 s
LOG:  checkpoint starting: time


And pg_stat_bgwriter :

postgres=# select *,now() from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean
| maxwritten_clean | buffers_backend | buffers_alloc |              now

-------------------+-----------------+--------------------+---------------+
------------------+-----------------+---------------+----------------------
---------
               388 |              13 |             494948 |       4306591
|            13555 |         7458743 |    7835244602 | 2011-11-26
00:43:47.232924+01
(1 row)

postgres=# select *,now() from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean
| maxwritten_clean | buffers_backend | buffers_alloc |              now

-------------------+-----------------+--------------------+---------------+
------------------+-----------------+---------------+----------------------
---------
               389 |              13 |             501802 |       4352198
|            13809 |         7469220 |    7839778941 | 2011-11-26
00:49:00.680779+01
(1 row)






Processes that were writing were "SELECT" queries against database.


Gaëtan


Le 26/11/11 00:17, « Cédric Villemain »
<cedric.villemain.debian@gmail.com> a écrit :

>Le 25 novembre 2011 23:47, Gaëtan Allart <gaetan@nexylan.com> a écrit :
>> Hello Tomas and Cédric,
>>
>> Right now, the server is not all right. Load is above 30 and queries are
>> slow like hell.
>>
>>
>> Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
>> queries).
>>
>> Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
>>  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
>>
>>
>> 27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
>> database database 176.31.228.6(38816) SELECT
>> 20226 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> database database 176.31.228.6(34166) SELECT
>> 26950 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 82.14 % postgres:
>> database database 46.105.104.205(40820) SELECT
>> 23160 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 81.14 % postgres:
>> database database 46.105.104.205(58091) SELECT
>> 29184 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 79.17 % postgres:
>> database database 46.105.104.205(51047) SELECT
>> 27271 be/4 postgres   23.46 K/s  234.58 K/s  0.00 % 77.15 % postgres:
>> database database 46.105.104.205(42315) SELECT
>> 28224 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 74.09 % postgres:
>> database database 46.105.104.205(49871) SELECT
>> 27450 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 70.12 % postgres:
>> database database 46.105.104.205(49096) SELECT
>> 29206 be/4 postgres  121.20 K/s    0.00 B/s  0.00 % 62.99 % postgres:
>> database database 176.31.228.6(39787) SELECT
>>  8202 be/4 postgres  129.02 K/s    0.00 B/s  0.00 % 60.59 % postgres:
>> database database 91.121.89.14(57291) SELECT
>> 29209 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 55.75 % postgres:
>> database database 176.31.228.6(39807) idle
>> 27046 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 52.35 % postgres:
>> database database 46.105.104.205(41285) SELECT
>> 29217 be/4 postgres   54.73 K/s    0.00 B/s  0.00 % 50.18 % postgres:
>> database database 46.105.104.205(51178) SELECT
>>  3249 be/4 root        3.91 K/s  320.59 K/s  0.00 % 40.31 % [kjournald]
>> 28918 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 34.83 % postgres:
>> database database 46.105.104.205(50164) SELECT
>> 26856 be/4 postgres    7.82 K/s  234.58 K/s  0.00 % 34.30 % postgres:
>> database database 46.105.104.205(40589) SELECT
>> 29205 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 34.17 % postgres:
>> database database 176.31.228.6(39784) SELECT
>>  4175 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.84 % [flush-8:0]
>> 28905 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.02 % postgres:
>> database database 46.105.104.205(50125) SELECT
>> 28919 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.01 % postgres:
>> database database 46.105.104.205(50167) SELECT
>> 19807 be/4 postgres    3.91 K/s    0.00 B/s  0.00 %  0.00 % postgres:
>> database database 176.31.228.6(33598) SELECT
>> 28703 be/4 postgres    0.00 B/s   86.01 K/s  0.00 %  0.00 % postgres:
>> database database 176.31.228.6(39260) SELECT
>> 29183 be/4 postgres    7.82 K/s    0.00 B/s  0.00 %  0.00 % postgres:
>> database database 46.105.104.205(51044) SELECT
>>
>>
>> Here is dirty_expire_centisecs :
>>
>> cat /proc/sys/vm/dirty_expire_centisecs
>> 3000
>>
>>
>> Bgwriter configuration is default :
>>
>> #bgwriter_delay = 200ms   # 10-10000ms between rounds
>> #bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
>> #bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers
>>scanned/round
>>
>>
>>
>>
>> Is there anything I can provide to help you ?
>
>the checkpoints logs and the output of pg_stat_bgwriter (as asked by
>Tomas).
>
>It is probable that shared_buffers are too small for your workload
>(expected), do you still have issue with checkpoint sync duration ?
>
>You should increase shared_buffers (you can iterate) *and* try to not
>have the long checkpoint sync. (keep shared_buffers <= 8GB, as already
>mentioned upthread)
>
>The output you provided does not show everything, and the select are
>not looking to write that much. What process are writing ?
>
>>
>>
>>
>> Thanks again,
>>
>> Gaëtan
>>
>>
>>
>> Le 25/11/11 14:12, « Tomas Vondra » <tv@fuzzy.cz> a écrit :
>>
>>>On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
>>>> Le 25 novembre 2011 11:25, Tomas Vondra <tv@fuzzy.cz> a écrit :
>>>>> On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
>>>>>>
>>>>>> It seem you have an issue with your checkpoint syncing time, it is
>>>>>> fixed in 9.1 and backported in 9.0 here :
>>>>>> http://projects.2ndquadrant.com/backports
>>>>>
>>>>> People generally don't want to apply backports on their own,
>>>>>especially
>>>>> when it's a production server and when it's unclear it actually fixes
>>>>> the
>>>>> issue they have. I'm not sure about that.
>>>>
>>>> I agree that most people don't want to do that themselves, but if it
>>>> happens to be the solution they can proceed or ask someone to do it.
>>>> People want to see their production system back to a normal situation,
>>>> here the limited information are not enought to be sure, but the
>>>> checkpoint sync time are clear: sync time are not correct.
>>>> It is very probable that compacting the fsync will help, but it is not
>>>> sure it is required yet.
>>>
>>>Yes, the sync times are quite crazy. Especially given the tiny number of
>>>buffers to be written and the fact that the SSD should handle random I/O
>>>quite well.
>>>
>>>>>> It is possible you have other problems that explains the issue you
>>>>>> have. An immediate solution before trying a patch is to reduce your
>>>>>> shared_buffer setting to something very low, like 1GB.
>>>>>
>>>>> Well, using low shared_buffers was used especially before 8.3, when
>>>>>the
>>>>> spread checkpoints were not available. It prevents the I/O overload
>>>>>when
>>>>> the database suddenly decides to write all of the dirty buffers. But
>>>>> he's
>>>>> on 9.0 (so he already has spread checkpoints).
>>>>
>>>> It is a different animal here.
>>>>
>>>>>
>>>>> Plus the number of buffers he's writing is negligible - usually about
>>>>> 700
>>>>> buffers (6MB), 3192 buffers (25MB) at most. That surely should not
>>>>>be a
>>>>> problem for the SSD he's using.
>>>>
>>>> See the blog entry from Greg Smith:
>>>>
>>>>http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.h
>>>>tm
>>>>l
>>>>
>>>> And the slides of his talk at pgconf2011:
>>>>
>>>>http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGC
>>>>on
>>>>2011.pdf
>>>>
>>>> I was just pointing that there are known issues in this area, with
>>>> known solutions.
>>>
>>>Thanks for the links, interesting stuff. Still, my impression is that
>>>the
>>>SSD is stressed by something else, and the high fsync values during a
>>>checkpoint are merely a symptom. So fixing a checkpoint (using the
>>>backpatch) won't actually fix the issue. But I'm just guessing here.
>>>
>>>> Getting more information on vacuum activity, bgwriter activity should
>>>>help
>>>> too.
>>>
>>>Yes, that'd be nice. Gaëtan, can you post bgwriter-related options from
>>>postgresql.conf and two snapshosts from pg_stat_bgwriter (say 5 minutes
>>>apart, collected when the db is slow)? A complete 'iotop -o' output
>>>would
>>>be nice too.
>>>
>>>BTW what filesystem are we talking about? What mount options have you
>>>used?
>>>
>>>What about the /proc/sys/vm/dirty_expire_centiseconds?
>>>
>>>Tomas
>>
>>
>
>
>
>--
>Cédric Villemain +33 (0)6 20 30 22 52
>http://2ndQuadrant.fr/
>PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: General performance/load issue

От

Tomas Vondra

Дата:

26 ноября 2011 г., 02:43:07

Dne 26.11.2011 00:17, Cédric Villemain napsal(a):
> Le 25 novembre 2011 23:47, Gaëtan Allart <gaetan@nexylan.com> a écrit :
>> Hello Tomas and Cédric,
>>
>> Right now, the server is not all right. Load is above 30 and queries are
>> slow like hell.
>>
>>
>> Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
>> queries).
>>
>> Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
>>  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
>>
>>
>> 27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
>> database database 176.31.228.6(38816) SELECT
>> 20226 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> database database 176.31.228.6(34166) SELECT
>> 26950 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 82.14 % postgres:
>> database database 46.105.104.205(40820) SELECT
>> 23160 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 81.14 % postgres:
>> database database 46.105.104.205(58091) SELECT
>> 29184 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 79.17 % postgres:
>> database database 46.105.104.205(51047) SELECT

Gaetan, you keep deleting the interesting rows for some reason. None of
the rows you've posted writes more than a few KB/s - what about the rows
that actually write those 71MB/s?

>> Here is dirty_expire_centisecs :
>>
>> cat /proc/sys/vm/dirty_expire_centisecs
>> 3000

OK. That's a default value and it's usually too high (just like the
ratios), but don't change it until we find out what's wrong.

>> Bgwriter configuration is default :
>>
>> #bgwriter_delay = 200ms   # 10-10000ms between rounds
>> #bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
>> #bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers scanned/round

OK.

>> Is there anything I can provide to help you ?
>
> the checkpoints logs and the output of pg_stat_bgwriter (as asked by Tomas).
>
> It is probable that shared_buffers are too small for your workload
> (expected), do you still have issue with checkpoint sync duration ?

I don't think that's the case. Too small shared buffers usually cause a
lot of reads (especially when all the queries are SELECTs as here), but
he has problems with writes.

And according to the pg_stat_bgwriter Gaetan posted, the checkpoints
wrote about 54MB, bgwriter about 370MB and backends about 80MB (during
the 5 minutes between snapshots).

So I'm really wondering where those 70MB/s of writes come from.

Two things just crossed my mind. The first one are hint bits - this may
cause a SELECT to write a lot of data. But I guess this is included in
the pg_stat_bgwriter stats.

The second one is on-disk sorting - this happens when a query needs to
sort so much data it can't be done in work_mem, so the data are pushed
to the disk, and AFAIK it's not included into the pg_stat_bgwriter.

But he has work_mem set to 128MB so it's rather unlikely. Gaetan, can
you verify that those queries that write the most data to the disk are
not performing any huge sorts etc? (See iotop which postgres process is
writing a lot of data and use the PID and pg_stat_activity to find out
which query it's executing.)

Tomas

Re: General performance/load issue

От

Gaëtan Allart

Дата:

26 ноября 2011 г., 08:45:57

A better view of iotop :

  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
16974 be/4 postgres   46.33 K/s    0.00 B/s  0.00 %  7.21 % postgres:
database database 46.105.111.92(54930) idle
  383 be/4 postgres    7.72 K/s    0.00 B/s  0.00 %  1.56 % postgres:
database database 46.105.104.205(51706) idle in transaction
15934 be/4 postgres    0.00 B/s   38.61 K/s  0.00 %  0.04 % postgres: wal
writer process
31487 be/4 postgres    0.00 B/s   11.32 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(48616) SELECT
29541 be/4 postgres    0.00 B/s   11.52 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(40675) SELECT
31419 be/4 postgres    0.00 B/s    5.99 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(48476) SELECT
31875 be/4 postgres    0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(50228) SELECT
30985 be/4 postgres    0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(47672) SELECT
30902 be/4 postgres    0.00 B/s    6.04 M/s  0.00 %  0.00 % postgres:
database database 176.31.228.6(43295) SELECT
30903 be/4 postgres    0.00 B/s    4.79 M/s  0.00 %  0.00 % postgres:
database database 46.105.104.205(47565) SELECT
15933 be/4 postgres    0.00 B/s   77.22 K/s  0.00 %  0.00 % postgres:
writer process



As you can see, SELECTS are writing very much on the disk.

At this moment, pg_stat_activity show many many queries running at the
same time.
It looks like that all queries are suddenly writing on disk and not a
particular one, making me think of a buffer issue or something.

This is a example of 10 MB/s writing query :

SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
"table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
"table"."url_article" AS "url_article", "table"."original_url" AS
"original_url", "table"."name" AS "name", "table"."description" AS
"description", "table"."content" AS "content", "table"."permis" AS
"permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
"author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
"post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
"add_date", "table"."comments_open" AS "comments_open", "table"."site_id"
AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
"status", "table"."visits" AS "visits", "table"."votes" AS "votes",
"table"."score" AS "score", "arti



Checkpoints logs still show very long write times :

LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
total=63.625 s



Couldn't this be a hardware issue ?

Best regards,

Gaëtan

Le 26/11/11 04:42, « Tomas Vondra » <tv@fuzzy.cz> a écrit :

>Dne 26.11.2011 00:17, Cédric Villemain napsal(a):
>> Le 25 novembre 2011 23:47, Gaëtan Allart <gaetan@nexylan.com> a écrit :
>>> Hello Tomas and Cédric,
>>>
>>> Right now, the server is not all right. Load is above 30 and queries
>>>are
>>> slow like hell.
>>>
>>>
>>> Here's the complete iotop. Note the 71 MB/s writes (apparently on
>>>SELECT
>>> queries).
>>>
>>> Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
>>>  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
>>>
>>>
>>> 27352 be/4 postgres   15.64 K/s   86.01 K/s  0.00 % 99.99 % postgres:
>>> database database 176.31.228.6(38816) SELECT
>>> 20226 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>>> database database 176.31.228.6(34166) SELECT
>>> 26950 be/4 postgres   23.46 K/s    0.00 B/s  0.00 % 82.14 % postgres:
>>> database database 46.105.104.205(40820) SELECT
>>> 23160 be/4 postgres    3.91 K/s    0.00 B/s  0.00 % 81.14 % postgres:
>>> database database 46.105.104.205(58091) SELECT
>>> 29184 be/4 postgres    7.82 K/s    0.00 B/s  0.00 % 79.17 % postgres:
>>> database database 46.105.104.205(51047) SELECT
>
>Gaetan, you keep deleting the interesting rows for some reason. None of
>the rows you've posted writes more than a few KB/s - what about the rows
>that actually write those 71MB/s?
>
>>> Here is dirty_expire_centisecs :
>>>
>>> cat /proc/sys/vm/dirty_expire_centisecs
>>> 3000
>
>OK. That's a default value and it's usually too high (just like the
>ratios), but don't change it until we find out what's wrong.
>
>>> Bgwriter configuration is default :
>>>
>>> #bgwriter_delay = 200ms   # 10-10000ms between rounds
>>> #bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
>>> #bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers
>>>scanned/round
>
>OK.
>
>>> Is there anything I can provide to help you ?
>>
>> the checkpoints logs and the output of pg_stat_bgwriter (as asked by
>>Tomas).
>>
>> It is probable that shared_buffers are too small for your workload
>> (expected), do you still have issue with checkpoint sync duration ?
>
>I don't think that's the case. Too small shared buffers usually cause a
>lot of reads (especially when all the queries are SELECTs as here), but
>he has problems with writes.
>
>And according to the pg_stat_bgwriter Gaetan posted, the checkpoints
>wrote about 54MB, bgwriter about 370MB and backends about 80MB (during
>the 5 minutes between snapshots).
>
>So I'm really wondering where those 70MB/s of writes come from.
>
>Two things just crossed my mind. The first one are hint bits - this may
>cause a SELECT to write a lot of data. But I guess this is included in
>the pg_stat_bgwriter stats.
>
>The second one is on-disk sorting - this happens when a query needs to
>sort so much data it can't be done in work_mem, so the data are pushed
>to the disk, and AFAIK it's not included into the pg_stat_bgwriter.
>
>But he has work_mem set to 128MB so it's rather unlikely. Gaetan, can
>you verify that those queries that write the most data to the disk are
>not performing any huge sorts etc? (See iotop which postgres process is
>writing a lot of data and use the PID and pg_stat_activity to find out
>which query it's executing.)
>
>Tomas
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

Re: General performance/load issue

От

"Tomas Vondra"

Дата:

26 ноября 2011 г., 13:12:23

On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
> A better view of iotop :
>
>   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
> 31875 be/4 postgres    0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
> database database 46.105.104.205(50228) SELECT
> 30985 be/4 postgres    0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
> database database 46.105.104.205(47672) SELECT
>
> As you can see, SELECTS are writing very much on the disk.
>
> At this moment, pg_stat_activity show many many queries running at the
> same time.
> It looks like that all queries are suddenly writing on disk and not a
> particular one, making me think of a buffer issue or something.

No, I don't think this is an issue with the size of shared buffers. That'd
influence the pg_stat_bgwriter - the buffers_backend would grow much
faster, and it's not the case.

So I'm guessing it's either hint bits or sorting. If I had to, I'd
probably guess about the hint bits - see for example this thread

http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php

it more or less behaves like what you described so far.

> This is a example of 10 MB/s writing query :
>
> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
> "table"."url_article" AS "url_article", "table"."original_url" AS
> "original_url", "table"."name" AS "name", "table"."description" AS
> "description", "table"."content" AS "content", "table"."permis" AS
> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
> "add_date", "table"."comments_open" AS "comments_open", "table"."site_id"
> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
> "status", "table"."visits" AS "visits", "table"."votes" AS "votes",
> "table"."score" AS "score", "arti

Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
it).

> Checkpoints logs still show very long write times :
>
> LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
> total=63.625 s

No, that's fine - that's what spread checkpoints do. Once the
checkpoint_timeout expires, the system decides to perform a checkpoint,
i.e. it has to write all dirty (modified) shared buffers to the disk. But
it knows the next checkpoint will happen in checkpoint_timeout, so it has
about 5 minutes to write all the data.

So it says something like 'I need to write 540MB want to write that in 270
seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
write phase is expected to take long. But it has to sync the data at the
end, and that's where the problems usually happen - so the important thing
is 'sync' and that improved significantly. 6 seconds is not great but it's
not a big issue I guess.

> Couldn't this be a hardware issue ?

I don't think so. The problem is that the queries write a lot of data, and
the SSD can't help in this case. The SSDs provide much better random
performance (compared to spinners), but the sequential performance is not
much better. So pushing 70MB/s to the SSD may be fully utilized.

Have you tested the basic performance (like bonnie++, dd etc) before going
into production? There are some common SSD-related mistakes (e.g. not
aligning the partitions properly) that may easily cause 30% performance
loss. But this won't fix the problem, although you could get a bit better
performance.

Tomas

Re: General performance/load issue

От

Robert Treat

Дата:

26 ноября 2011 г., 13:25:17

On Fri, Nov 25, 2011 at 6:48 PM, Gaëtan Allart <gaetan@nexylan.com> wrote:
> Here are the latest checkpoint logs :
>
> LOG:  checkpoint complete: wrote 842 buffers (0.1%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=168.970 s, sync=0.005 s,
> total=168.977 s
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 318 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 2 recycled; write=63.818 s, sync=0.006 s,
> total=63.825 s
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 744 buffers (0.1%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=147.035 s, sync=0.006 s,
> total=147.043 s
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 108 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 3 recycled; write=35.410 s, sync=14.921 s,
> total=54.811 s
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 393 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=88.835 s, sync=43.210 s,
> total=135.728 s
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 914 buffers (0.1%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=148.162 s, sync=14.249 s,
> total=170.481 s
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 202 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=53.152 s, sync=0.004 s,
> total=53.159 s
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 897 buffers (0.1%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=42.414 s, sync=1.175 s,
> total=58.957 s
> LOG:  checkpoint starting: shutdown immediate
> LOG:  checkpoint complete: wrote 666 buffers (0.1%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=0.027 s, sync=1.600 s,
> total=1.630 s
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 627 buffers (0.1%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=125.856 s, sync=0.006 s,
> total=125.864 s
> LOG:  checkpoint starting: time
>
>
> And pg_stat_bgwriter :
>
> postgres=# select *,now() from pg_stat_bgwriter;
>  checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean
> | maxwritten_clean | buffers_backend | buffers_alloc |              now
>
> -------------------+-----------------+--------------------+---------------+
> ------------------+-----------------+---------------+----------------------
> ---------
>               388 |              13 |             494948 |       4306591
> |            13555 |         7458743 |    7835244602 | 2011-11-26
> 00:43:47.232924+01
> (1 row)
>
> postgres=# select *,now() from pg_stat_bgwriter;
>  checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean
> | maxwritten_clean | buffers_backend | buffers_alloc |              now
>
> -------------------+-----------------+--------------------+---------------+
> ------------------+-----------------+---------------+----------------------
> ---------
>               389 |              13 |             501802 |       4352198
> |            13809 |         7469220 |    7839778941 | 2011-11-26
> 00:49:00.680779+01
> (1 row)
>
> Processes that were writing were "SELECT" queries against database.
>

What was the filesystem involved? What is the underlying disk layout
(you said it's SSD's, but how what type, how many, and in what
configuration? Also how is Postgres set up on top of the disks (all of
$PGDATA and OS on one volume? Split up?)

Also, how many active connections do you typically have? Can you
reduce your sort mem to something like 4MB, and set log_temp_files to
0?


Robert Treat
conjecture: xzilla.net
consulting: omniti.com

Re: General performance/load issue

От

Gaëtan Allart

Дата:

26 ноября 2011 г., 15:57:52

Hello Robert,

I'm having around 30 active connections (average).
As far as disk layout is concerned, ever thing's on the same disk (raid 1
with 2 SSDs).

Gaëtan

Le 26/11/11 15:25, « Robert Treat » <rob@xzilla.net> a écrit :

>On Fri, Nov 25, 2011 at 6:48 PM, Gaëtan Allart <gaetan@nexylan.com> wrote:
>> Here are the latest checkpoint logs :
>>
>> LOG:  checkpoint complete: wrote 842 buffers (0.1%); 0 transaction log
>> file(s) added, 0 removed, 0 recycled; write=168.970 s, sync=0.005 s,
>> total=168.977 s
>> LOG:  checkpoint starting: time
>> LOG:  checkpoint complete: wrote 318 buffers (0.0%); 0 transaction log
>> file(s) added, 0 removed, 2 recycled; write=63.818 s, sync=0.006 s,
>> total=63.825 s
>> LOG:  checkpoint starting: time
>> LOG:  checkpoint complete: wrote 744 buffers (0.1%); 0 transaction log
>> file(s) added, 0 removed, 0 recycled; write=147.035 s, sync=0.006 s,
>> total=147.043 s
>> LOG:  checkpoint starting: time
>> LOG:  checkpoint complete: wrote 108 buffers (0.0%); 0 transaction log
>> file(s) added, 0 removed, 3 recycled; write=35.410 s, sync=14.921 s,
>> total=54.811 s
>> LOG:  checkpoint starting: time
>> LOG:  checkpoint complete: wrote 393 buffers (0.0%); 0 transaction log
>> file(s) added, 0 removed, 0 recycled; write=88.835 s, sync=43.210 s,
>> total=135.728 s
>> LOG:  checkpoint starting: time
>> LOG:  checkpoint complete: wrote 914 buffers (0.1%); 0 transaction log
>> file(s) added, 0 removed, 1 recycled; write=148.162 s, sync=14.249 s,
>> total=170.481 s
>> LOG:  checkpoint starting: time
>> LOG:  checkpoint complete: wrote 202 buffers (0.0%); 0 transaction log
>> file(s) added, 0 removed, 1 recycled; write=53.152 s, sync=0.004 s,
>> total=53.159 s
>> LOG:  checkpoint starting: time
>> LOG:  checkpoint complete: wrote 897 buffers (0.1%); 0 transaction log
>> file(s) added, 0 removed, 1 recycled; write=42.414 s, sync=1.175 s,
>> total=58.957 s
>> LOG:  checkpoint starting: shutdown immediate
>> LOG:  checkpoint complete: wrote 666 buffers (0.1%); 0 transaction log
>> file(s) added, 0 removed, 1 recycled; write=0.027 s, sync=1.600 s,
>> total=1.630 s
>> LOG:  checkpoint starting: time
>> LOG:  checkpoint complete: wrote 627 buffers (0.1%); 0 transaction log
>> file(s) added, 0 removed, 1 recycled; write=125.856 s, sync=0.006 s,
>> total=125.864 s
>> LOG:  checkpoint starting: time
>>
>>
>> And pg_stat_bgwriter :
>>
>> postgres=# select *,now() from pg_stat_bgwriter;
>>  checkpoints_timed | checkpoints_req | buffers_checkpoint |
>>buffers_clean
>> | maxwritten_clean | buffers_backend | buffers_alloc |              now
>>
>>
>>-------------------+-----------------+--------------------+--------------
>>-+
>>
>>------------------+-----------------+---------------+--------------------
>>--
>> ---------
>>               388 |              13 |             494948 |       4306591
>> |            13555 |         7458743 |    7835244602 | 2011-11-26
>> 00:43:47.232924+01
>> (1 row)
>>
>> postgres=# select *,now() from pg_stat_bgwriter;
>>  checkpoints_timed | checkpoints_req | buffers_checkpoint |
>>buffers_clean
>> | maxwritten_clean | buffers_backend | buffers_alloc |              now
>>
>>
>>-------------------+-----------------+--------------------+--------------
>>-+
>>
>>------------------+-----------------+---------------+--------------------
>>--
>> ---------
>>               389 |              13 |             501802 |       4352198
>> |            13809 |         7469220 |    7839778941 | 2011-11-26
>> 00:49:00.680779+01
>> (1 row)
>>
>> Processes that were writing were "SELECT" queries against database.
>>
>
>What was the filesystem involved? What is the underlying disk layout
>(you said it's SSD's, but how what type, how many, and in what
>configuration? Also how is Postgres set up on top of the disks (all of
>$PGDATA and OS on one volume? Split up?)
>
>Also, how many active connections do you typically have? Can you
>reduce your sort mem to something like 4MB, and set log_temp_files to
>0?
>
>
>Robert Treat
>conjecture: xzilla.net
>consulting: omniti.com

Re: General performance/load issue

От

Gaëtan Allart

Дата:

26 ноября 2011 г., 15:57:52

Rahh :/

It's getting worse and worse :/ Database has to be restarted every 2 hours
causing much traffic loss :/

As far as the server is concerned, it was running great 7 days ago and had
been running like this for months. I really don't get why it suddenly went
"I/oing"Š


Here's the current postgresql.conf :

shared_buffers = 6GB   # min 128kB
     # (change requires restart)
temp_buffers = 40MB   # min 800kB
     # (change requires restart)
work_mem = 96MB    # min 64k
maintenance_work_mem = 256MB  # min 1MB
max_stack_depth = 2MB   # min 100kB

effective_cache_size = 32GB


max_files_per_process = 8192  # min 25
     # (change requires restart)

fsync = on   # turns forced synchronization on or offf
synchronous_commit = off  # immediate fsync at commit
     # supported by the operating system:
wal_buffers = 16MB   # min 32kB


checkpoint_segments = 40  # in logfile segments, min 1, 16MB
eachcheckpoint_timeout = 5min  # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0




seq_page_cost = 1.0   # measured on an arbitrary scale
random_page_cost = 2.0   # same scale as above





Isn't there anything I can do to keep my database Up&running even with bad
performance?

Filesystem is ext3. Running over a hardware RAID-1 config.


Gaëtan

Le 26/11/11 15:12, « Tomas Vondra » <tv@fuzzy.cz> a écrit :

>On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
>> A better view of iotop :
>>
>>   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
>> 31875 be/4 postgres    0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
>> database database 46.105.104.205(50228) SELECT
>> 30985 be/4 postgres    0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
>> database database 46.105.104.205(47672) SELECT
>>
>> As you can see, SELECTS are writing very much on the disk.
>>
>> At this moment, pg_stat_activity show many many queries running at the
>> same time.
>> It looks like that all queries are suddenly writing on disk and not a
>> particular one, making me think of a buffer issue or something.
>
>No, I don't think this is an issue with the size of shared buffers. That'd
>influence the pg_stat_bgwriter - the buffers_backend would grow much
>faster, and it's not the case.
>
>So I'm guessing it's either hint bits or sorting. If I had to, I'd
>probably guess about the hint bits - see for example this thread
>
>http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php
>
>it more or less behaves like what you described so far.
>
>> This is a example of 10 MB/s writing query :
>>
>> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
>> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
>> "table"."url_article" AS "url_article", "table"."original_url" AS
>> "original_url", "table"."name" AS "name", "table"."description" AS
>> "description", "table"."content" AS "content", "table"."permis" AS
>> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
>> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
>> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
>> "add_date", "table"."comments_open" AS "comments_open",
>>"table"."site_id"
>> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
>> "status", "table"."visits" AS "visits", "table"."votes" AS "votes",
>> "table"."score" AS "score", "arti
>
>Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
>it).
>
>> Checkpoints logs still show very long write times :
>>
>> LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
>> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
>> total=63.625 s
>
>No, that's fine - that's what spread checkpoints do. Once the
>checkpoint_timeout expires, the system decides to perform a checkpoint,
>i.e. it has to write all dirty (modified) shared buffers to the disk. But
>it knows the next checkpoint will happen in checkpoint_timeout, so it has
>about 5 minutes to write all the data.
>
>So it says something like 'I need to write 540MB want to write that in 270
>seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
>write phase is expected to take long. But it has to sync the data at the
>end, and that's where the problems usually happen - so the important thing
>is 'sync' and that improved significantly. 6 seconds is not great but it's
>not a big issue I guess.
>
>> Couldn't this be a hardware issue ?
>
>I don't think so. The problem is that the queries write a lot of data, and
>the SSD can't help in this case. The SSDs provide much better random
>performance (compared to spinners), but the sequential performance is not
>much better. So pushing 70MB/s to the SSD may be fully utilized.
>
>Have you tested the basic performance (like bonnie++, dd etc) before going
>into production? There are some common SSD-related mistakes (e.g. not
>aligning the partitions properly) that may easily cause 30% performance
>loss. But this won't fix the problem, although you could get a bit better
>performance.
>
>Tomas

Re: General performance/load issue

От

Gaëtan Allart

Дата:

26 ноября 2011 г., 16:12:25

Uhm…

I'm seeing dozens and dozens of temporary file creations in logs :

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1425", size 25340
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.195", size
2720340
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3495", size 24724
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2674", size
2712452
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1850", size 25284
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.822", size
2717464
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1050", size 25060
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.501", size
2700248
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3768", size 23156
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2132", size
2713204
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1886", size 23744
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.311", size
2829600
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3849", size 25088
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2352", size
2770352
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1267", size 25592
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.765", size
2803744
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3775", size 24444
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2686", size
2858836
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1094", size 24948
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.106", size
2800140
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1399", size 23912
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.211", size
2761788
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3747", size 27188
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2639", size
2749672
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1053", size 24276
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.452", size
2948712




Does this help ?

Gaëtan

Le 26/11/11 17:47, « Gaëtan Allart » <gaetan@nexylan.com> a écrit :

>Rahh :/
>
>It's getting worse and worse :/ Database has to be restarted every 2 hours
>causing much traffic loss :/
>
>As far as the server is concerned, it was running great 7 days ago and had
>been running like this for months. I really don't get why it suddenly went
>"I/oing"Š
>
>
>Here's the current postgresql.conf :
>
>shared_buffers = 6GB   # min 128kB
>     # (change requires restart)
>temp_buffers = 40MB   # min 800kB
>     # (change requires restart)
>work_mem = 96MB    # min 64k
>maintenance_work_mem = 256MB  # min 1MB
>max_stack_depth = 2MB   # min 100kB
>
>effective_cache_size = 32GB
>
>
>max_files_per_process = 8192  # min 25
>     # (change requires restart)
>
>fsync = on   # turns forced synchronization on or offf
>synchronous_commit = off  # immediate fsync at commit
>     # supported by the operating system:
>wal_buffers = 16MB   # min 32kB
>
>
>checkpoint_segments = 40  # in logfile segments, min 1, 16MB
>eachcheckpoint_timeout = 5min  # range 30s-1h
>checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0
>
>
>
>
>seq_page_cost = 1.0   # measured on an arbitrary scale
>random_page_cost = 2.0   # same scale as above
>
>
>
>
>
>Isn't there anything I can do to keep my database Up&running even with bad
>performance?
>
>Filesystem is ext3. Running over a hardware RAID-1 config.
>
>
>Gaëtan
>
>Le 26/11/11 15:12, « Tomas Vondra » <tv@fuzzy.cz> a écrit :
>
>>On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
>>> A better view of iotop :
>>>
>>>   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
>>> 31875 be/4 postgres    0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
>>> database database 46.105.104.205(50228) SELECT
>>> 30985 be/4 postgres    0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
>>> database database 46.105.104.205(47672) SELECT
>>>
>>> As you can see, SELECTS are writing very much on the disk.
>>>
>>> At this moment, pg_stat_activity show many many queries running at the
>>> same time.
>>> It looks like that all queries are suddenly writing on disk and not a
>>> particular one, making me think of a buffer issue or something.
>>
>>No, I don't think this is an issue with the size of shared buffers.
>>That'd
>>influence the pg_stat_bgwriter - the buffers_backend would grow much
>>faster, and it's not the case.
>>
>>So I'm guessing it's either hint bits or sorting. If I had to, I'd
>>probably guess about the hint bits - see for example this thread
>>
>>http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php
>>
>>it more or less behaves like what you described so far.
>>
>>> This is a example of 10 MB/s writing query :
>>>
>>> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
>>> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
>>> "table"."url_article" AS "url_article", "table"."original_url" AS
>>> "original_url", "table"."name" AS "name", "table"."description" AS
>>> "description", "table"."content" AS "content", "table"."permis" AS
>>> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
>>> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
>>> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
>>> "add_date", "table"."comments_open" AS "comments_open",
>>>"table"."site_id"
>>> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
>>> "status", "table"."visits" AS "visits", "table"."votes" AS "votes",
>>> "table"."score" AS "score", "arti
>>
>>Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
>>it).
>>
>>> Checkpoints logs still show very long write times :
>>>
>>> LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
>>> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
>>> total=63.625 s
>>
>>No, that's fine - that's what spread checkpoints do. Once the
>>checkpoint_timeout expires, the system decides to perform a checkpoint,
>>i.e. it has to write all dirty (modified) shared buffers to the disk. But
>>it knows the next checkpoint will happen in checkpoint_timeout, so it has
>>about 5 minutes to write all the data.
>>
>>So it says something like 'I need to write 540MB want to write that in
>>270
>>seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
>>write phase is expected to take long. But it has to sync the data at the
>>end, and that's where the problems usually happen - so the important
>>thing
>>is 'sync' and that improved significantly. 6 seconds is not great but
>>it's
>>not a big issue I guess.
>>
>>> Couldn't this be a hardware issue ?
>>
>>I don't think so. The problem is that the queries write a lot of data,
>>and
>>the SSD can't help in this case. The SSDs provide much better random
>>performance (compared to spinners), but the sequential performance is not
>>much better. So pushing 70MB/s to the SSD may be fully utilized.
>>
>>Have you tested the basic performance (like bonnie++, dd etc) before
>>going
>>into production? There are some common SSD-related mistakes (e.g. not
>>aligning the partitions properly) that may easily cause 30% performance
>>loss. But this won't fix the problem, although you could get a bit better
>>performance.
>>
>>Tomas
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

Re: General performance/load issue

От

Tomas Vondra

Дата:

26 ноября 2011 г., 16:58:35

Dne 26.11.2011 18:08, Gaëtan Allart napsal(a):
> Uhm…
>
> I'm seeing dozens and dozens of temporary file creations in logs :
>
> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1399", size 23912
> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.211", size
> 2761788
> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3747", size 27188
> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2639", size
> 2749672
> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1053", size 24276
> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.452", size
> 2948712

Please, add timestamp to log_line_prefix so that we know how frequently
that happens.

Anyway the output is a bit strange. It's writing a lot of temp files
that are significantly smaller (about 3MB) than work_mem (128MB).

Tomas

Re: General performance/load issue

От

Tom Lane

Дата:

26 ноября 2011 г., 17:08:59

Tomas Vondra <tv@fuzzy.cz> writes:
> Anyway the output is a bit strange. It's writing a lot of temp files
> that are significantly smaller (about 3MB) than work_mem (128MB).

The most obvious guess about what's doing that is a hash join that has
a drastic overestimate of how many rows it has to hash, so that it's
batching the join unnecessarily.  AFAIR, sorts don't spill to disk until
they've actually reached work_mem, but hash join works differently.

            regards, tom lane

Re: General performance/load issue

От

Tomas Vondra

Дата:

26 ноября 2011 г., 17:15:34

Dne 26.11.2011 17:47, Gaëtan Allart napsal(a):
> Rahh :/
>
> It's getting worse and worse :/ Database has to be restarted every 2 hours
> causing much traffic loss :/
>
> As far as the server is concerned, it was running great 7 days ago and had
> been running like this for months. I really don't get why it suddenly went
> "I/oing"Š

Given the info provided so far we can rule out checkpoint issues.

These sudden changes in performance happen when the when the stats are
off or vacuum is not able to maintain all the tables.

1) Post EXPLAIN ANALYZE of the queries that cause a lot of writes.

2) How much did the database grew recently?

3) Is the autovacuum running fine? Check the logs, and maybe set

     log_autovacuum_min_duration = 100

   or a different value. Set the log_line_prefix (add '%t').

4) Check the pg_stat_all_tables:

   - large n_dead_tup values (especially compared to n_live_tup)

   - tables with a lot of changed (n_tup_ins + n_tup_upd + n_tup_del +
     n_tup_hot_upd) that were not vacuumed / analyzed recently

Tomas

Re: General performance/load issue

От

Tomas Vondra

Дата:

26 ноября 2011 г., 23:28:30

On 26.11.2011 19:08, Tom Lane wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> Anyway the output is a bit strange. It's writing a lot of temp files
>> that are significantly smaller (about 3MB) than work_mem (128MB).
>
> The most obvious guess about what's doing that is a hash join that has
> a drastic overestimate of how many rows it has to hash, so that it's
> batching the join unnecessarily.  AFAIR, sorts don't spill to disk until
> they've actually reached work_mem, but hash join works differently.

OK, so it probably is not sorting, probably. What about other nodes
using hash table - aggregates etc.? Would those spill to the disk in
case of overestimation?

Gaetan did not post the whole query, but it starts like this:

SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
"table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
...

so it obviously uses a DISTINCT clause, probably evaluated using a hash
aggregate. And this is allegedly one of the queries that write a lot.

Tomas

Re: General performance/load issue

От

Gaëtan Allart

Дата:

27 ноября 2011 г., 08:46:10

Actually, this is because I changed sort_mem to 4 Mb as asked by Robert.

I removed this setting..

Gaëtan

Le 26/11/11 18:58, « Tomas Vondra » <tv@fuzzy.cz> a écrit :

>Dne 26.11.2011 18:08, Gaëtan Allart napsal(a):
>> UhmŠ
>>
>> I'm seeing dozens and dozens of temporary file creations in logs :
>>
>> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1399", size
>>23912
>> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.211", size
>> 2761788
>> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3747", size
>>27188
>> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2639", size
>> 2749672
>> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1053", size
>>24276
>> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.452", size
>> 2948712
>
>Please, add timestamp to log_line_prefix so that we know how frequently
>that happens.
>
>Anyway the output is a bit strange. It's writing a lot of temp files
>that are significantly smaller (about 3MB) than work_mem (128MB).
>
>Tomas
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

Re: General performance/load issue

От

"Tomas Vondra"

Дата:

27 ноября 2011 г., 10:01:44

On 27 Listopad 2011, 10:45, Gaëtan Allart wrote:
> Actually, this is because I changed sort_mem to 4 Mb as asked by Robert.
>
> I removed this setting..

This is exactly the reason why it's recommended not to do any changes
until it's reasonably certain what is the caause.

Did increasing the work_mem again solved the issues? Is the database
running fine?

If no, post the iotop output again, so we can see what was the impact. And
post the EXPLAIN ANALYZE of the query, so we can see if the estimates are
OK.

Anyway, Rober was right - the work_mem=128MB is rather large and should be
decreased. Use something like a binary search to find the right value -
use 64MB and watch the log for temp file messages. If there's a lot of
messages, choose the upper interval and set (64+128)/2 = 96MB, otherwise
32MB. And so on. That should lead you to the minimal work_mem value.

Tomas

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: General performance/load issue