Обсуждение: Simple queries slowdown, maybe related to 3-minute long " in transaction"?

Поиск
Список
Период
Сортировка

Simple queries slowdown, maybe related to 3-minute long " in transaction"?

От
Dmitry Koterov
Дата:
Hello.

Sometimes I see a strange slowdown on my PG 9.1 server: it looks like the simplest queries which typically take 1ms or less (e.g. selection of a row by its primary key) take 300ms or even more. It is related to all queries within the connection, not the single one: once upon a time all fast queries start to work 300 times slower. In a couple of minutes the effect disappears. The machine is not overloaded, number of connections is not too high, mtr shows no packet loss.

E.g. see the following log from a particular website page I've been refreshing (times are in milliseconds, you may see that typically all 22 queries take near 55ms, but within 3 minutes - from 20:21 till 20:23 approx - even simplest queries took more than 300ms instead of 1ms, so the total time is many seconds).

web-01: [2012-11-20 20:20:57] web07725] total=250        cpu=210             php=203        dbq=22      db=55ms
web-01: [2012-11-20 20:21:06] web07752] total=6254       cpu=220             php=219        dbq=22      db=6035ms    <--
web-01: [2012-11-20 20:21:48] web07718] total=7506       cpu=220             php=222        dbq=22      db=7284ms    <--
web-01: [2012-11-20 20:22:15] web07717] total=11143      cpu=224             php=225        dbq=22      db=10918ms   <--
web-01: [2012-11-20 20:23:44] web07725] total=4402       cpu=224             php=220        dbq=22      db=4182ms    <--
web-01: [2012-11-20 20:24:03] web07725] total=268        cpu=212             php=211        dbq=22      db=57ms

The machine is not overloaded during all this period (I've been logging "top" and "pstree" each 10 seconds, results are quite typical):

Nov 20 20:23:07 pg-m03 mon: [Tue Nov 20 20:23:07 MSK 2012]
Nov 20 20:23:07 pg-m03 mon: init-+-crond
Nov 20 20:23:07 pg-m03 mon:      |-master-+-pickup
Nov 20 20:23:07 pg-m03 mon:      |        `-qmgr
Nov 20 20:23:07 pg-m03 mon:      |-postmaster---32*[postmaster]
Nov 20 20:23:07 pg-m03 mon:      |-screen---sh---su---bash---monitoring.sh-+-logger
Nov 20 20:23:07 pg-m03 mon:      |                                         `-monitoring.sh---pstree
Nov 20 20:23:07 pg-m03 mon:      |-sshd---sshd---perl
Nov 20 20:23:07 pg-m03 mon:      |-syslog-ng
Nov 20 20:23:07 pg-m03 mon:      |-udevd
Nov 20 20:23:07 pg-m03 mon:      |-vzctl---bash---mc---bash
Nov 20 20:23:07 pg-m03 mon:      `-xinetd
Nov 20 20:23:07 pg-m03 mon:
Nov 20 20:23:08 pg-m03 mon: top - 20:23:08 up 113 days,  3:29,  0 users,  load average: 2.16, 2.10, 2.00
Nov 20 20:23:08 pg-m03 mon: Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Nov 20 20:23:08 pg-m03 mon: Cpu(s): 23.8%us,  2.8%sy,  0.0%ni, 72.7%id,  0.8%wa,  0.0%hi,  0.0%si,  0.0%st
Nov 20 20:23:08 pg-m03 mon: Mem:   1463616k total,  1097776k used,   365840k free,        0k buffers

Also I've been logging all queries which are active in the database each 10 seconds. During the slowdown the database was not overloaded, no strange queries, except one thing: during the whole slowdown period two pg processes was in "<IDLE> in transaction" state, and in 3 minutes (see 3rd column - it's a "time from start of the query") these processes disappeared, and the slowdown disappeared too:

                                     | duration        | query
Nov 20 20:22:05 pg-m03 mon:    21586 | 00:00:37.571057 | <IDLE> in transaction
Nov 20 20:22:05 pg-m03 mon:    21567 | 00:00:37.557606 | <IDLE> in transaction
Nov 20 20:22:36 pg-m03 mon:    21586 | 00:01:09.119639 | <IDLE> in transaction
Nov 20 20:22:36 pg-m03 mon:    21567 | 00:01:09.106188 | <IDLE> in transaction
Nov 20 20:22:47 pg-m03 mon:    21586 | 00:01:19.637888 | <IDLE> in transaction
Nov 20 20:22:47 pg-m03 mon:    21567 | 00:01:19.624437 | <IDLE> in transaction
Nov 20 20:22:57 pg-m03 mon:    21586 | 00:01:30.155492 | <IDLE> in transaction
Nov 20 20:22:57 pg-m03 mon:    21567 | 00:01:30.142041 | <IDLE> in transaction
Nov 20 20:24:21 pg-m03 mon:    21586 | 00:02:54.294799 | <IDLE> in transaction
Nov 20 20:24:21 pg-m03 mon:    21567 | 00:02:54.281348 | <IDLE> in transaction
Nov 20 20:24:31 - no more "<IDLE> in transaction" (seems they disappeared after 3 minutes timeout?)

Maybe these are correlated? But how could it be that two "idle" processes cause slowdown of other connections (note: not "timeout", but exactly - slowdown from 1ms to 300ms)?

What could be the problem of all these things? Such slowdown typically happens 2-3 times per day, no direct correlation with checkpoints, disks loads or something else. It just starts to work slow even on simplest queries, and then it restores normal operations.

Re: Simple queries slowdown, maybe related to 3-minute long " in transaction"?

От
Merlin Moncure
Дата:
On Tue, Nov 20, 2012 at 1:25 PM, Dmitry Koterov <dmitry@koterov.ru> wrote:
> Hello.
>
> Sometimes I see a strange slowdown on my PG 9.1 server: it looks like the
> simplest queries which typically take 1ms or less (e.g. selection of a row
> by its primary key) take 300ms or even more. It is related to all queries
> within the connection, not the single one: once upon a time all fast queries
> start to work 300 times slower. In a couple of minutes the effect
> disappears. The machine is not overloaded, number of connections is not too
> high, mtr shows no packet loss.
>
> E.g. see the following log from a particular website page I've been
> refreshing (times are in milliseconds, you may see that typically all 22
> queries take near 55ms, but within 3 minutes - from 20:21 till 20:23 approx
> - even simplest queries took more than 300ms instead of 1ms, so the total
> time is many seconds).
>
> web-01: [2012-11-20 20:20:57] web07725] total=250        cpu=210
> php=203        dbq=22      db=55ms
> web-01: [2012-11-20 20:21:06] web07752] total=6254       cpu=220
> php=219        dbq=22      db=6035ms    <--
> web-01: [2012-11-20 20:21:48] web07718] total=7506       cpu=220
> php=222        dbq=22      db=7284ms    <--
> web-01: [2012-11-20 20:22:15] web07717] total=11143      cpu=224
> php=225        dbq=22      db=10918ms   <--
> web-01: [2012-11-20 20:23:44] web07725] total=4402       cpu=224
> php=220        dbq=22      db=4182ms    <--
> web-01: [2012-11-20 20:24:03] web07725] total=268        cpu=212
> php=211        dbq=22      db=57ms
>
> The machine is not overloaded during all this period (I've been logging
> "top" and "pstree" each 10 seconds, results are quite typical):
>
> Nov 20 20:23:07 pg-m03 mon: [Tue Nov 20 20:23:07 MSK 2012]
> Nov 20 20:23:07 pg-m03 mon: init-+-crond
> Nov 20 20:23:07 pg-m03 mon:      |-master-+-pickup
> Nov 20 20:23:07 pg-m03 mon:      |        `-qmgr
> Nov 20 20:23:07 pg-m03 mon:      |-postmaster---32*[postmaster]
> Nov 20 20:23:07 pg-m03 mon:
> |-screen---sh---su---bash---monitoring.sh-+-logger
> Nov 20 20:23:07 pg-m03 mon:      |
> `-monitoring.sh---pstree
> Nov 20 20:23:07 pg-m03 mon:      |-sshd---sshd---perl
> Nov 20 20:23:07 pg-m03 mon:      |-syslog-ng
> Nov 20 20:23:07 pg-m03 mon:      |-udevd
> Nov 20 20:23:07 pg-m03 mon:      |-vzctl---bash---mc---bash
> Nov 20 20:23:07 pg-m03 mon:      `-xinetd
> Nov 20 20:23:07 pg-m03 mon:
> Nov 20 20:23:08 pg-m03 mon: top - 20:23:08 up 113 days,  3:29,  0 users,
> load average: 2.16, 2.10, 2.00
> Nov 20 20:23:08 pg-m03 mon: Tasks:   1 total,   0 running,   1 sleeping,   0
> stopped,   0 zombie
> Nov 20 20:23:08 pg-m03 mon: Cpu(s): 23.8%us,  2.8%sy,  0.0%ni, 72.7%id,
> 0.8%wa,  0.0%hi,  0.0%si,  0.0%st
> Nov 20 20:23:08 pg-m03 mon: Mem:   1463616k total,  1097776k used,   365840k
> free,        0k buffers
>
> Also I've been logging all queries which are active in the database each 10
> seconds. During the slowdown the database was not overloaded, no strange
> queries, except one thing: during the whole slowdown period two pg processes
> was in "<IDLE> in transaction" state, and in 3 minutes (see 3rd column -
> it's a "time from start of the query") these processes disappeared, and the
> slowdown disappeared too:
>
>                                      | duration        | query
> Nov 20 20:22:05 pg-m03 mon:    21586 | 00:00:37.571057 | <IDLE> in
> transaction
> Nov 20 20:22:05 pg-m03 mon:    21567 | 00:00:37.557606 | <IDLE> in
> transaction
> Nov 20 20:22:36 pg-m03 mon:    21586 | 00:01:09.119639 | <IDLE> in
> transaction
> Nov 20 20:22:36 pg-m03 mon:    21567 | 00:01:09.106188 | <IDLE> in
> transaction
> Nov 20 20:22:47 pg-m03 mon:    21586 | 00:01:19.637888 | <IDLE> in
> transaction
> Nov 20 20:22:47 pg-m03 mon:    21567 | 00:01:19.624437 | <IDLE> in
> transaction
> Nov 20 20:22:57 pg-m03 mon:    21586 | 00:01:30.155492 | <IDLE> in
> transaction
> Nov 20 20:22:57 pg-m03 mon:    21567 | 00:01:30.142041 | <IDLE> in
> transaction
> Nov 20 20:24:21 pg-m03 mon:    21586 | 00:02:54.294799 | <IDLE> in
> transaction
> Nov 20 20:24:21 pg-m03 mon:    21567 | 00:02:54.281348 | <IDLE> in
> transaction
> Nov 20 20:24:31 - no more "<IDLE> in transaction" (seems they disappeared
> after 3 minutes timeout?)
>
> Maybe these are correlated? But how could it be that two "idle" processes
> cause slowdown of other connections (note: not "timeout", but exactly -
> slowdown from 1ms to 300ms)?
>
> What could be the problem of all these things? Such slowdown typically
> happens 2-3 times per day, no direct correlation with checkpoints, disks
> loads or something else. It just starts to work slow even on simplest
> queries, and then it restores normal operations.

Hm. Maybe application is leaking transaction and locks along with it?
Using any connection pooling -- in particular, one managed on client
side?  One of the first things to check in slow down period is
pg_locks view: both in terms of raw # of granted locks and any
ungranted ones.

merlin