Обсуждение: weird performances problem

Поиск

Список

Период

Сортировка

weird performances problem

От

Guillaume Smet

Дата:

17 ноября 2005 г., 19:48:32

Hi all,

We are operating a 1.5GB postgresql database for a year and we have
problems for nearly a month. Usually everything is OK with the database,
queries are executed fast even if they are complicated but sometimes and
for half an hour, we have a general slow down.

The server is a dedicated quad xeon with 4GB and a RAID1 array for the
system and a RAID10 one for postgresql data. We have very few
updates/inserts/deletes during the day.

Postgresql version is 7.4.8.

- the database is vacuumed, analyzed regularly (but we are not using
autovacuum) and usually performs pretty well ;
- IOs are OK, the database is entirely in RAM (see top.txt and
iostat.txt attached) ;
- CPUs are usually 25% idle, load is never really growing and its
maximum is below 5 ;
- I attached two plans for a simple query, the one is what we have when
the server is fast, the other when we have a slow down: it's exactly the
same plan but, as you can see it, the time to fetch the first row from
indexes is quite high for the slow query ;
- during this slow down, queries that usually take 500ms can take up to
60 seconds (and sometimes even more) ;
- we have up to 130 permanent connections from our apache servers during
this slow down as we have a lot of apache children waiting for a response.

I attached a vmstat output. Context switches are quite high but I'm not
sure it can be called a context switch storm and that this is the cause
of our problem.

Thanks for any advice or idea to help us understanding this problem and
hopefully solve it.

Regards,

--
Guillaume
[root@bd root]# iostat 10

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               7.20         0.00        92.00          0        920
sda1              0.00         0.00         0.00          0          0
sda2              6.40         0.00        78.40          0        784
sda3              0.00         0.00         0.00          0          0
sda4              0.00         0.00         0.00          0          0
sda5              0.00         0.00         0.00          0          0
sda6              0.80         0.00        13.60          0        136
sdb               5.00         0.00       165.60          0       1656
sdb1              5.00         0.00       165.60          0       1656

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.30         0.00        20.80          0        208
sda1              0.00         0.00         0.00          0          0
sda2              0.70         0.00         9.60          0         96
sda3              0.00         0.00         0.00          0          0
sda4              0.00         0.00         0.00          0          0
sda5              0.00         0.00         0.00          0          0
sda6              0.60         0.00        11.20          0        112
sdb               5.40         0.00       173.60          0       1736
sdb1              5.40         0.00       173.60          0       1736

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               2.20         0.00        28.00          0        280
sda1              0.00         0.00         0.00          0          0
sda2              2.20         0.00        28.00          0        280
sda3              0.00         0.00         0.00          0          0
sda4              0.00         0.00         0.00          0          0
sda5              0.00         0.00         0.00          0          0
sda6              0.00         0.00         0.00          0          0
sdb               5.20         0.00       171.20          0       1712
sdb1              5.20         0.00       171.20          0       1712

                                                            QUERY PLAN
          

----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..13.52 rows=2 width=1119) (actual time=0.154..0.167 rows=1 loops=1)
   ->  Index Scan using pk_newslang on newslang nl  (cost=0.00..3.87 rows=1 width=1004) (actual time=0.053..0.055
rows=1loops=1) 
         Index Cond: (((codelang)::text = 'FRA'::text) AND (3498704 = numnews))
   ->  Nested Loop Left Join  (cost=0.00..9.61 rows=2 width=119) (actual time=0.050..0.059 rows=1 loops=1)
         ->  Index Scan using pk_news on news n  (cost=0.00..3.31 rows=2 width=98) (actual time=0.021..0.023 rows=1
loops=1)
               Index Cond: (numnews = 3498704)
         ->  Index Scan using pk_photo on photo p  (cost=0.00..3.14 rows=1 width=25) (actual time=0.021..0.025 rows=1
loops=1)
               Index Cond: (p.numphoto = "outer".numphoto)
 Total runtime: 0.320 ms
(9 rows)

                                                             QUERY PLAN
            

------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..13.52 rows=2 width=1119) (actual time=155.286..155.305 rows=1 loops=1)
   ->  Index Scan using pk_newslang on newslang nl  (cost=0.00..3.87 rows=1 width=1004) (actual time=44.575..44.579
rows=1loops=1) 
         Index Cond: (((codelang)::text = 'FRA'::text) AND (3498704 = numnews))
   ->  Nested Loop Left Join  (cost=0.00..9.61 rows=2 width=119) (actual time=110.648..110.660 rows=1 loops=1)
         ->  Index Scan using pk_news on news n  (cost=0.00..3.31 rows=2 width=98) (actual time=0.169..0.174 rows=1
loops=1)
               Index Cond: (numnews = 3498704)
         ->  Index Scan using pk_photo on photo p  (cost=0.00..3.14 rows=1 width=25) (actual time=110.451..110.454
rows=1loops=1) 
               Index Cond: (p.numphoto = "outer".numphoto)
 Total runtime: 155.514 ms
(9 rows)


 17:08:41  up 19 days, 15:16,  1 user,  load average: 4.03, 4.26, 4.36
288 processes: 285 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   59.0%    0.0%    8.8%   0.2%     0.0%    0.0%   31.9%
           cpu00   52.3%    0.0%   13.3%   0.9%     0.0%    0.0%   33.3%
           cpu01   65.7%    0.0%    7.6%   0.0%     0.0%    0.0%   26.6%
           cpu02   58.0%    0.0%    7.6%   0.0%     0.0%    0.0%   34.2%
           cpu03   60.0%    0.0%    6.6%   0.0%     0.0%    0.0%   33.3%
Mem:  3857224k av, 3495880k used,  361344k free,       0k shrd,   92160k buff
                   2374048k actv,  463576k in_d,   37708k in_c
Swap: 4281272k av,   25412k used, 4255860k free                 2173392k cached

[root@bd root]# vmstat 10
procs                      memory      swap          io     system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 4  0  25412 250160  91736 2170944    0    1     3     3    4     3  5  3  3  0
 7  0  25412 247160  91736 2171024    0    0     0   116  588  4483 65  4 31  0
 3  0  25412 235456  91752 2171132    0    0     0   129  491  3670 70  4 26  0
 5  0  25412 233696  91760 2171216    0    0     0   152  530  4768 61  4 34  0
 5  0  25412 233248  91768 2171232    0    0     0   183  624  5379 59  5 36  0
 9  0  25412 195332  91788 2171304    0    0     0   127  541  4811 58  5 37  0

Re: weird performances problem

От

Andrew Sullivan

Дата:

17 ноября 2005 г., 21:19:54

On Thu, Nov 17, 2005 at 06:47:09PM +0100, Guillaume Smet wrote:
> queries are executed fast even if they are complicated but sometimes and
> for half an hour, we have a general slow down.

Is it exactly half an hour?  What changes at the time that happens
(i.e. what else happens on the machine?).  Is this a time, for
example, when logrotate is killing your I/O with file moves?

A
--
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to.  That actually seems sort of quaint now.
        --J.D. Baldwin

Re: weird performances problem

От

Guillaume Smet

Дата:

17 ноября 2005 г., 22:39:42

Andrew,

Andrew Sullivan wrote:
 > Is it exactly half an hour?  What changes at the time that happens
 > (i.e. what else happens on the machine?).  Is this a time, for
 > example, when logrotate is killing your I/O with file moves?

No, it's not exactly half an hour. It's just that it slows down for some
time (10, 20, 30 minutes) and then it's OK again. It happens several
times per day. I checked if there are other processes running when we
have this slow down but it's not the case.
There's not really a difference between when it's OK or not (apart from
the number of connections because the db is too slow): load is still at
4 or 5, iowait is still at 0%, there's still cpu idle and we still have
free memory. I can't find what is the limit and why there is cpu idle.

I forgot to give our non default postgresql.conf parameters:
shared_buffers = 28800
sort_mem = 32768
vacuum_mem = 32768
max_fsm_pages = 350000
max_fsm_relations = 2000
checkpoint_segments = 16
effective_cache_size = 270000
random_page_cost = 2

Thanks for your help

--
Guillaume

Re: weird performances problem

От

Claus Guttesen

Дата:

17 ноября 2005 г., 23:22:16

> I forgot to give our non default postgresql.conf parameters:
> shared_buffers = 28800
> sort_mem = 32768
> vacuum_mem = 32768
> max_fsm_pages = 350000
> max_fsm_relations = 2000
> checkpoint_segments = 16
> effective_cache_size = 270000
> random_page_cost = 2

Isn't sort_mem quite high? Remember that sort_mem size is allocated
for each sort, not for each connection. Mine is 4096 (4 MB). My
effective_cache_size is set to 27462.

What OS are you running?

regards
Claus

Re: weird performances problem

От

Andrew Sullivan

Дата:

18 ноября 2005 г., 18:43:46

On Fri, Nov 18, 2005 at 12:35:06AM +0100, Guillaume Smet wrote:
> sort_mem = 32768

I would be very suspicious of that much memory for sort.  Please see
the docs for what that does.  That is the amount that _each sort_ can
allocate before spilling to disk.  If some set of your users are
causing complicated queries with, say, four sorts apiece, then each
user is potentially allocating 4x that much memory.  That's going to
wreak havoc on your disk buffers (which are tricky to monitor on most
systems, and impossible on some).

This'd be the first knob I'd twiddle, for sure.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
It is above all style through which power defers to reason.
        --J. Robert Oppenheimer

Re: weird performances problem

От

"Qingqing Zhou"

Дата:

22 ноября 2005 г., 06:20:55

"Guillaume Smet" <guillaume.smet@openwide.fr> wrote
> [root@bd root]# iostat 10
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               7.20         0.00        92.00          0        920
> sda1              0.00         0.00         0.00          0          0
> sda2              6.40         0.00        78.40          0        784
> sda3              0.00         0.00         0.00          0          0
> sda4              0.00         0.00         0.00          0          0
> sda5              0.00         0.00         0.00          0          0
> sda6              0.80         0.00        13.60          0        136
> sdb               5.00         0.00       165.60          0       1656
> sdb1              5.00         0.00       165.60          0       1656
>
> Nested Loop  (cost=0.00..13.52 rows=2 width=1119) (actual
> time=155.286..155.305 rows=1 loops=1)
>   ->  Index Scan using pk_newslang on newslang nl  (cost=0.00..3.87 rows=1
> width=1004) (actual time=44.575..44.579 rows=1 loops=1)
>         Index Cond: (((codelang)::text = 'FRA'::text) AND (3498704 =
> numnews))
>   ->  Nested Loop Left Join  (cost=0.00..9.61 rows=2 width=119) (actual
> time=110.648..110.660 rows=1 loops=1)
>         ->  Index Scan using pk_news on news n  (cost=0.00..3.31 rows=2
> width=98) (actual time=0.169..0.174 rows=1 loops=1)
>               Index Cond: (numnews = 3498704)
>         ->  Index Scan using pk_photo on photo p  (cost=0.00..3.14 rows=1
> width=25) (actual time=110.451..110.454 rows=1 loops=1)
>               Index Cond: (p.numphoto = "outer".numphoto)
> Total runtime: 155.514 ms
>

Someone is doing a massive *write* at this time, which makes your query
*read* quite slow. Can you find out which process is doing write?

Regards,
Qingqing

Re: weird performances problem

От

Guillaume Smet

Дата:

22 ноября 2005 г., 08:49:25

Andrew,

> I would be very suspicious of that much memory for sort.  Please see
> the docs for what that does.  That is the amount that _each sort_ can
> allocate before spilling to disk.
> If some set of your users are
> causing complicated queries with, say, four sorts apiece, then each
> user is potentially allocating 4x that much memory.  That's going to
> wreak havoc on your disk buffers (which are tricky to monitor on most
> systems, and impossible on some).

Yes, we have effectively complicated queries. That's why we put the
sort_mem so high. I'll see if we can put it lower for the next few days
to see if it improves our performances.

Thanks for your help.

--
Guillaume

Re: weird performances problem

От

Guillaume Smet

Дата:

22 ноября 2005 г., 08:56:42

Qingqing Zhou wrote:
> Someone is doing a massive *write* at this time, which makes your query
> *read* quite slow. Can you find out which process is doing write?

Indexes should be in memory so I don't expect a massive write to slow
down the select queries. sdb is the RAID10 array dedicated to our data
so the postgresql process is the only one to write on it. I'll check
which write queries are running because there should really be a few
updates/inserts on our db during the day.

On a four days log analysis, I have the following:
SELECT  403,964
INSERT     574
UPDATE     393
DELETE     26
So it's not really normal to have a massive write during the day.

Thanks for your help

--
Guillaume

Re: weird performances problem

От

Ron

Дата:

22 ноября 2005 г., 11:53:53

If I understand your HW config correctly, all of the pg stuff is on
the same RAID 10 set?

If so, give WAL its own dedicated RAID 10 set.  This looks like the
old problem of everything stalling while WAL is being committed to HD.

This concept works for other tables as well.  If you have a tables
that both want services at the same time, disk arm contention will
drag performance into the floor when they are on the same HW set.

Profile your HD access and put tables that want to be accessed at the
same time on different HD sets.  Even if you have to buy more HW to do it.

Ron

At 04:56 AM 11/22/2005, Guillaume Smet wrote:
>Qingqing Zhou wrote:
>>Someone is doing a massive *write* at this time, which makes your
>>query *read* quite slow. Can you find out which process is doing write?
>
>Indexes should be in memory so I don't expect a massive write to
>slow down the select queries. sdb is the RAID10 array dedicated to
>our data so the postgresql process is the only one to write on it.
>I'll check which write queries are running because there should
>really be a few updates/inserts on our db during the day.
>
>On a four days log analysis, I have the following:
>SELECT  403,964
>INSERT  574
>UPDATE  393
>DELETE  26
>So it's not really normal to have a massive write during the day.
>
>Thanks for your help
>
>--
>Guillaume
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: Don't 'kill -9' the postmaster

Re: weird performances problem

От

Guillaume Smet

Дата:

22 ноября 2005 г., 13:26:45

Ron wrote:
> If I understand your HW config correctly, all of the pg stuff is on the
> same RAID 10 set?

No, the system and the WAL are on a RAID 1 array and the data on their
own RAID 10 array.
As I said earlier, there's only a few writes in the database so I'm not
really sure the WAL can be a limitation: IIRC, it's only used for writes
isn't it?
Don't you think we should have some io wait if the database was waiting
for the WAL? We _never_ have any io wait on this server but our CPUs are
still 30-40% idle.

A typical top we have on this server is:
  15:22:39  up 24 days, 13:30,  2 users,  load average: 3.86, 3.96, 3.99
156 processes: 153 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
            total   50.6%    0.0%    4.7%   0.0%     0.6%    0.0%   43.8%
            cpu00   47.4%    0.0%    3.1%   0.3%     1.5%    0.0%   47.4%
            cpu01   43.7%    0.0%    3.7%   0.0%     0.5%    0.0%   51.8%
            cpu02   58.9%    0.0%    7.7%   0.0%     0.1%    0.0%   33.0%
            cpu03   52.5%    0.0%    4.1%   0.0%     0.1%    0.0%   43.0%
Mem:  3857224k av, 3307416k used,  549808k free,       0k shrd,   80640k
buff
                    2224424k actv,  482552k in_d,   49416k in_c
Swap: 4281272k av,   10032k used, 4271240k free                 2602424k
cached

As you can see, we don't swap, we have free memory, we have all our data
cached (our database size is 1.5 GB).

Context switch are between 10,000 and 20,000 per seconds.

> This concept works for other tables as well.  If you have a tables that
> both want services at the same time, disk arm contention will drag
> performance into the floor when they are on the same HW set.
> Profile your HD access and put tables that want to be accessed at the
> same time on different HD sets.  Even if you have to buy more HW to do it.

I use iostat and I can only see a little write activity and no read
activity on both raid arrays.

--
Guillaume

Re: weird performances problem

От

Guillaume Smet

Дата:

22 ноября 2005 г., 13:37:54

Claus and Andrew,

Claus Guttesen wrote:
> Isn't sort_mem quite high? Remember that sort_mem size is allocated
> for each sort, not for each connection. Mine is 4096 (4 MB). My
> effective_cache_size is set to 27462.

I tested sort mem from 4096 to 32768 (4096, 8192, 16384, 32768) this
afternoon and 32768 is definitely the best value for us. We still have
free memory using it, we don't have any swap and queries are generally
faster (I log all queries taking more than 500ms and the log is growing
far faster with lower values of sort_mem).

> What OS are you running?

# cat /etc/redhat-release
CentOS release 3.6 (Final)
so it's a RHEL 3 upd 6.

# uname -a
Linux our.host 2.4.21-37.ELsmp #1 SMP Wed Sep 28 14:05:46 EDT 2005 i686
i686 i386 GNU/Linux

# cat /proc/cpuinfo
4x
processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) MP CPU 2.20GHz
stepping        : 6
cpu MHz         : 2192.976
cache size      : 512 KB

# cat /proc/meminfo
         total:    used:    free:  shared: buffers:  cached:
Mem:  3949797376 3478376448 471420928        0 83410944 2679156736
Swap: 4384022528  9797632 4374224896
MemTotal:      3857224 kB
MemFree:        460372 kB
MemShared:           0 kB
Buffers:         81456 kB
Cached:        2610764 kB

HTH

Regards,

--
Guillaume

Re: weird performances problem

От

Ron

Дата:

22 ноября 2005 г., 13:49:26

At 09:26 AM 11/22/2005, Guillaume Smet wrote:
>Ron wrote:
>>If I understand your HW config correctly, all of the pg stuff is on
>>the same RAID 10 set?
>
>No, the system and the WAL are on a RAID 1 array and the data on
>their own RAID 10 array.

As has been noted many times around here, put the WAL on its own
dedicated HD's.  You don't want any head movement on those HD's.


>As I said earlier, there's only a few writes in the database so I'm
>not really sure the WAL can be a limitation: IIRC, it's only used
>for writes isn't it?

When you reach a WAL checkpoint, pg commits WAL data to HD... ...and
does almost nothing else until said commit is done.


>Don't you think we should have some io wait if the database was
>waiting for the WAL? We _never_ have any io wait on this server but
>our CPUs are still 30-40% idle.
_Something_ is doing long bursts of write IO on sdb and sdb1 every 30
minutes or so according to your previous posts.

Profile your DBMS and find out what.


>A typical top we have on this server is:
>  15:22:39  up 24 days, 13:30,  2 users,  load average: 3.86, 3.96, 3.99
>156 processes: 153 sleeping, 3 running, 0 zombie, 0 stopped
>CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
>            total   50.6%    0.0%    4.7%   0.0%     0.6%    0.0%   43.8%
>            cpu00   47.4%    0.0%    3.1%   0.3%     1.5%    0.0%   47.4%
>            cpu01   43.7%    0.0%    3.7%   0.0%     0.5%    0.0%   51.8%
>            cpu02   58.9%    0.0%    7.7%   0.0%     0.1%    0.0%   33.0%
>            cpu03   52.5%    0.0%    4.1%   0.0%     0.1%    0.0%   43.0%
>Mem:  3857224k av, 3307416k used,  549808k free,       0k shrd,   80640k buff
>                    2224424k actv,  482552k in_d,   49416k in_c
>Swap: 4281272k av,   10032k used, 4271240k
>free                 2602424k cached
>
>As you can see, we don't swap, we have free memory, we have all our
>data cached (our database size is 1.5 GB).
>
>Context switch are between 10,000 and 20,000 per seconds.
That's actually a reasonably high CS rate.  Again, why?


>>This concept works for other tables as well.  If you have tables
>>that both want services at the same time, disk arm contention will
>>drag performance into the floor when they are on the same HW set.
>>Profile your HD access and put tables that want to be accessed at
>>the same time on different HD sets.  Even if you have to buy more HW to do it.
>
>I use iostat and I can only see a little write activity and no read
>activity on both raid arrays.
Remember it's not just the overall amount, it's _when_and _where_ the
write activity takes place.  If you have almost no write activity,
but whenever it happens it all happens to the same place by multiple
things contending for the same HDs, your performance during that time
will be poor.

Since the behavior you are describing fits that cause very well, I'd
see if you can verify that's what's going on.

Ron

Re: weird performances problem

От

Guillaume Smet

Дата:

22 ноября 2005 г., 14:26:17

Ron,

First of all, thanks for your time.

> As has been noted many times around here, put the WAL on its own
> dedicated HD's.  You don't want any head movement on those HD's.

Yep, I know that. That's just we supposed it was not so important if it
was nearly a readonly database which is wrong according to you.

> _Something_ is doing long bursts of write IO on sdb and sdb1 every 30
> minutes or so according to your previous posts.

It's not every 30 minutes. It's a 20-30 minutes slow down 3-4 times a
day when we have a high load.
Anyway apart from this problem which is temporary, we have cpu idle all
the day when we don't have any io wait (and nearly no write) and the
server is enough loaded to use all the 4 cpus. I don't think it's normal.
It's not a very good idea but do you think we can put fsync to off
during a few minutes to check if the WAL is effectively our problem? A
simple reload of the configuration seems to take it into account. So can
we disable it temporarily even when the database is running?
If it is the case, I think we'll test it and if it solved our problem,
we'll ask our customer to buy two new disks to have a specific RAID 1
array for the pg_xlog.

> That's actually a reasonably high CS rate.  Again, why?

I'm not so surprised considering what I read before about Xeon
multiprocessors, pg 7.4 and the famous context switch storm. We are
planning a migration to 8.1 to (hopefully) solve this problem. Perhaps
our problem is due to that high CS rate.

--
Guillaume

Re: weird performances problem

От

Ron

Дата:

22 ноября 2005 г., 16:20:11

At 10:26 AM 11/22/2005, Guillaume Smet wrote:
>Ron,
>
>First of all, thanks for your time.
Happy to help.

>>As has been noted many times around here, put the WAL on its own
>>dedicated HD's.  You don't want any head movement on those HD's.
>
>Yep, I know that. That's just we supposed it was not so important if
>it was nearly a readonly database which is wrong according to you.
It's just good practice with pg that pg-xlog should always get it's
own dedicated HD set.  OTOH, I'm not at all convinced given the scant
evidence so far that it is the primary problem here; particularly
since if I understand you correctly, px-xlog is not on sdb or sdb1
where you are having the write storm.

>>_Something_ is doing long bursts of write IO on sdb and sdb1 every
>>30 minutes or so according to your previous posts.
>
>It's not every 30 minutes. It's a 20-30 minutes slow down 3-4 times
>a day when we have a high load.

Thanks for the correction and I apologize for the misunderstanding.
Clearly the first step is to instrument sdb and sdb1 so that you
understand exactly what is being accessed and written on them.

Possibilities that come to mind:
a) Are some of your sorts requiring more than 32MB during high
load?  If sorts do not usually require HD temp files and suddenly do,
you could get behavior like yours.

b) Are you doing too many 32MB sorts during high load?  Same comment as above.

c) Are you doing some sort of data updates or appends during high
load that do not normally occur?

d) Are you constantly doing "a little" write IO that turns into a
write storm under high load because of queuing issues?

Put the scaffolding in needed to trace _exactly_ what's happening on
sdb and sdb1 throughout the day and then examine the traces over a
day, a few days, and a week.  I'll bet you will notice some patterns
that will be helpful in identifying what's going on.

Ron

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: weird performances problem