Обсуждение: Simple queries slowdown, maybe related to 3-minute long " in transaction"?

Поиск

Список

Период

Сортировка

Simple queries slowdown, maybe related to 3-minute long " in transaction"?

От

Dmitry Koterov

Дата:

20 ноября 2012 г., 22:25:45

Hello.

Sometimes I see a strange slowdown on my PG 9.1 server: it looks like the simplest queries which typically take 1ms or less (e.g. selection of a row by its primary key) take 300ms or even more. It is related to all queries within the connection, not the single one: once upon a time all fast queries start to work 300 times slower. In a couple of minutes the effect disappears. The machine is not overloaded, number of connections is not too high, mtr shows no packet loss.

E.g. see the following log from a particular website page I've been refreshing (times are in milliseconds, you may see that typically all 22 queries take near 55ms, but within 3 minutes - from 20:21 till 20:23 approx - even simplest queries took more than 300ms instead of 1ms, so the total time is many seconds).

web-01: [2012-11-20 20:20:57] web07725] total=250 cpu=210 php=203 dbq=22 db=55ms

web-01: [2012-11-20 20:21:06] web07752] total=6254 cpu=220 php=219 dbq=22 db=6035ms <--

web-01: [2012-11-20 20:21:48] web07718] total=7506 cpu=220 php=222 dbq=22 db=7284ms <--

web-01: [2012-11-20 20:22:15] web07717] total=11143 cpu=224 php=225 dbq=22 db=10918ms <--

web-01: [2012-11-20 20:23:44] web07725] total=4402 cpu=224 php=220 dbq=22 db=4182ms <--

web-01: [2012-11-20 20:24:03] web07725] total=268 cpu=212 php=211 dbq=22 db=57ms

The machine is not overloaded during all this period (I've been logging "top" and "pstree" each 10 seconds, results are quite typical):

Nov 20 20:23:07 pg-m03 mon: [Tue Nov 20 20:23:07 MSK 2012]

Nov 20 20:23:07 pg-m03 mon: init-+-crond

Nov 20 20:23:07 pg-m03 mon: |-master-+-pickup

Nov 20 20:23:07 pg-m03 mon: | `-qmgr

Nov 20 20:23:07 pg-m03 mon: |-postmaster---32*[postmaster]

Nov 20 20:23:07 pg-m03 mon: |-screen---sh---su---bash---monitoring.sh-+-logger

Nov 20 20:23:07 pg-m03 mon: | `-monitoring.sh---pstree

Nov 20 20:23:07 pg-m03 mon: |-sshd---sshd---perl

Nov 20 20:23:07 pg-m03 mon: |-syslog-ng

Nov 20 20:23:07 pg-m03 mon: |-udevd

Nov 20 20:23:07 pg-m03 mon: |-vzctl---bash---mc---bash

Nov 20 20:23:07 pg-m03 mon: `-xinetd

Nov 20 20:23:07 pg-m03 mon:

Nov 20 20:23:08 pg-m03 mon: top - 20:23:08 up 113 days, 3:29, 0 users, load average: 2.16, 2.10, 2.00

Nov 20 20:23:08 pg-m03 mon: Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie

Nov 20 20:23:08 pg-m03 mon: Cpu(s): 23.8%us, 2.8%sy, 0.0%ni, 72.7%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st

Nov 20 20:23:08 pg-m03 mon: Mem: 1463616k total, 1097776k used, 365840k free, 0k buffers

Also I've been logging all queries which are active in the database each 10 seconds. During the slowdown the database was not overloaded, no strange queries, except one thing: during the whole slowdown period two pg processes was in "<IDLE> in transaction" state, and in 3 minutes (see 3rd column - it's a "time from start of the query") these processes disappeared, and the slowdown disappeared too:

| duration | query

Nov 20 20:22:05 pg-m03 mon: 21586 | 00:00:37.571057 | <IDLE> in transaction

Nov 20 20:22:05 pg-m03 mon: 21567 | 00:00:37.557606 | <IDLE> in transaction

Nov 20 20:22:36 pg-m03 mon: 21586 | 00:01:09.119639 | <IDLE> in transaction

Nov 20 20:22:36 pg-m03 mon: 21567 | 00:01:09.106188 | <IDLE> in transaction

Nov 20 20:22:47 pg-m03 mon: 21586 | 00:01:19.637888 | <IDLE> in transaction

Nov 20 20:22:47 pg-m03 mon: 21567 | 00:01:19.624437 | <IDLE> in transaction

Nov 20 20:22:57 pg-m03 mon: 21586 | 00:01:30.155492 | <IDLE> in transaction

Nov 20 20:22:57 pg-m03 mon: 21567 | 00:01:30.142041 | <IDLE> in transaction

Nov 20 20:24:21 pg-m03 mon: 21586 | 00:02:54.294799 | <IDLE> in transaction

Nov 20 20:24:21 pg-m03 mon: 21567 | 00:02:54.281348 | <IDLE> in transaction

Nov 20 20:24:31 - no more "<IDLE> in transaction" (seems they disappeared after 3 minutes timeout?)

Maybe these are correlated? But how could it be that two "idle" processes cause slowdown of other connections (note: not "timeout", but exactly - slowdown from 1ms to 300ms)?

What could be the problem of all these things? Such slowdown typically happens 2-3 times per day, no direct correlation with checkpoints, disks loads or something else. It just starts to work slow even on simplest queries, and then it restores normal operations.

Re: Simple queries slowdown, maybe related to 3-minute long " in transaction"?

От

Merlin Moncure

Дата:

20 ноября 2012 г., 23:33:59

On Tue, Nov 20, 2012 at 1:25 PM, Dmitry Koterov <dmitry@koterov.ru> wrote:
> Hello.
>
> Sometimes I see a strange slowdown on my PG 9.1 server: it looks like the
> simplest queries which typically take 1ms or less (e.g. selection of a row
> by its primary key) take 300ms or even more. It is related to all queries
> within the connection, not the single one: once upon a time all fast queries
> start to work 300 times slower. In a couple of minutes the effect
> disappears. The machine is not overloaded, number of connections is not too
> high, mtr shows no packet loss.
>
> E.g. see the following log from a particular website page I've been
> refreshing (times are in milliseconds, you may see that typically all 22
> queries take near 55ms, but within 3 minutes - from 20:21 till 20:23 approx
> - even simplest queries took more than 300ms instead of 1ms, so the total
> time is many seconds).
>
> web-01: [2012-11-20 20:20:57] web07725] total=250        cpu=210
> php=203        dbq=22      db=55ms
> web-01: [2012-11-20 20:21:06] web07752] total=6254       cpu=220
> php=219        dbq=22      db=6035ms    <--
> web-01: [2012-11-20 20:21:48] web07718] total=7506       cpu=220
> php=222        dbq=22      db=7284ms    <--
> web-01: [2012-11-20 20:22:15] web07717] total=11143      cpu=224
> php=225        dbq=22      db=10918ms   <--
> web-01: [2012-11-20 20:23:44] web07725] total=4402       cpu=224
> php=220        dbq=22      db=4182ms    <--
> web-01: [2012-11-20 20:24:03] web07725] total=268        cpu=212
> php=211        dbq=22      db=57ms
>
> The machine is not overloaded during all this period (I've been logging
> "top" and "pstree" each 10 seconds, results are quite typical):
>
> Nov 20 20:23:07 pg-m03 mon: [Tue Nov 20 20:23:07 MSK 2012]
> Nov 20 20:23:07 pg-m03 mon: init-+-crond
> Nov 20 20:23:07 pg-m03 mon:      |-master-+-pickup
> Nov 20 20:23:07 pg-m03 mon:      |        `-qmgr
> Nov 20 20:23:07 pg-m03 mon:      |-postmaster---32*[postmaster]
> Nov 20 20:23:07 pg-m03 mon:
> |-screen---sh---su---bash---monitoring.sh-+-logger
> Nov 20 20:23:07 pg-m03 mon:      |
> `-monitoring.sh---pstree
> Nov 20 20:23:07 pg-m03 mon:      |-sshd---sshd---perl
> Nov 20 20:23:07 pg-m03 mon:      |-syslog-ng
> Nov 20 20:23:07 pg-m03 mon:      |-udevd
> Nov 20 20:23:07 pg-m03 mon:      |-vzctl---bash---mc---bash
> Nov 20 20:23:07 pg-m03 mon:      `-xinetd
> Nov 20 20:23:07 pg-m03 mon:
> Nov 20 20:23:08 pg-m03 mon: top - 20:23:08 up 113 days,  3:29,  0 users,
> load average: 2.16, 2.10, 2.00
> Nov 20 20:23:08 pg-m03 mon: Tasks:   1 total,   0 running,   1 sleeping,   0
> stopped,   0 zombie
> Nov 20 20:23:08 pg-m03 mon: Cpu(s): 23.8%us,  2.8%sy,  0.0%ni, 72.7%id,
> 0.8%wa,  0.0%hi,  0.0%si,  0.0%st
> Nov 20 20:23:08 pg-m03 mon: Mem:   1463616k total,  1097776k used,   365840k
> free,        0k buffers
>
> Also I've been logging all queries which are active in the database each 10
> seconds. During the slowdown the database was not overloaded, no strange
> queries, except one thing: during the whole slowdown period two pg processes
> was in "<IDLE> in transaction" state, and in 3 minutes (see 3rd column -
> it's a "time from start of the query") these processes disappeared, and the
> slowdown disappeared too:
>
>                                      | duration        | query
> Nov 20 20:22:05 pg-m03 mon:    21586 | 00:00:37.571057 | <IDLE> in
> transaction
> Nov 20 20:22:05 pg-m03 mon:    21567 | 00:00:37.557606 | <IDLE> in
> transaction
> Nov 20 20:22:36 pg-m03 mon:    21586 | 00:01:09.119639 | <IDLE> in
> transaction
> Nov 20 20:22:36 pg-m03 mon:    21567 | 00:01:09.106188 | <IDLE> in
> transaction
> Nov 20 20:22:47 pg-m03 mon:    21586 | 00:01:19.637888 | <IDLE> in
> transaction
> Nov 20 20:22:47 pg-m03 mon:    21567 | 00:01:19.624437 | <IDLE> in
> transaction
> Nov 20 20:22:57 pg-m03 mon:    21586 | 00:01:30.155492 | <IDLE> in
> transaction
> Nov 20 20:22:57 pg-m03 mon:    21567 | 00:01:30.142041 | <IDLE> in
> transaction
> Nov 20 20:24:21 pg-m03 mon:    21586 | 00:02:54.294799 | <IDLE> in
> transaction
> Nov 20 20:24:21 pg-m03 mon:    21567 | 00:02:54.281348 | <IDLE> in
> transaction
> Nov 20 20:24:31 - no more "<IDLE> in transaction" (seems they disappeared
> after 3 minutes timeout?)
>
> Maybe these are correlated? But how could it be that two "idle" processes
> cause slowdown of other connections (note: not "timeout", but exactly -
> slowdown from 1ms to 300ms)?
>
> What could be the problem of all these things? Such slowdown typically
> happens 2-3 times per day, no direct correlation with checkpoints, disks
> loads or something else. It just starts to work slow even on simplest
> queries, and then it restores normal operations.

Hm. Maybe application is leaking transaction and locks along with it?
Using any connection pooling -- in particular, one managed on client
side?  One of the first things to check in slow down period is
pg_locks view: both in terms of raw # of granted locks and any
ungranted ones.

merlin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Simple queries slowdown, maybe related to 3-minute long " in transaction"?

Simple queries slowdown, maybe related to 3-minute long " in transaction"?

Re: Simple queries slowdown, maybe related to 3-minute long " in transaction"?