Обсуждение: Server crashed and now experiencing slow running queries

Поиск

Список

Период

Сортировка

Server crashed and now experiencing slow running queries

От

Keaton Adams

Дата:

05 декабря 2007 г., 04:58:27

We have two servers configured the same way running the same type of processes that write/read to the database. Server 2 filled up pg_xlog and crashed. When it came back we began to experience slow query performance. I ran an ANALYZE against the tables involved in the query, but for some reason the optimizer is still choosing a hash join instead of a nested loop join, which is killing performance. The query on Server 1 runs in 14 seconds and the same query on Server 2 runs in 15 minutes:

Server 1
                           -> HashAggregate (cost=501922.84..501922.85 rows=1 width=532)
                                 -> Nested Loop (cost=250961.41..501922.83 rows=1 width=532)
                                       Join Filter: (("outer".host)::text = ("inner".host)::text)
                                       -> HashAggregate (cost=250961.40..250961.41 rows=1 width=26)
                                             -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..250961.40 rows=1 width=26)
                                                   Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
                                       -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..250961.40 rows=1 width=42)
                                             Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

Server 2

                           -> HashAggregate (cost=1814101.48..1814129.36 rows=2230 width=532)
                                 -> Hash Join (cost=906978.28..1814079.18 rows=2230 width=532)
                                       Hash Cond: (("outer".host)::text = ("inner".host)::text)
                                       -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..906877.88 rows=40147 width=42)
                                             Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
                                       -> Hash (cost=906978.27..906978.27 rows=1 width=516)
                                             -> HashAggregate (cost=906978.25..906978.26 rows=1 width=26)
                                                   -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..906877.88 rows=40147 width=26)
                                                         Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

Besides ANALYZE, what else can I do / look at to figure out why the optimizer is making the choices it is on Server 2, now causing slow performance problems?

Thanks,

Keaton

Re: Server crashed and now experiencing slow running queries

От

Keaton Adams

Дата:

05 декабря 2007 г., 05:17:02

We’re running PostgreSQL 8.1.4 on RHEL. I’m running a vacuum analyze on the mxl_fs_size table to see if that shows anything.

-Keaton

On 12/4/07 10:50 PM, "Keaton Adams" <kadams@mxlogic.com> wrote:

We have two servers configured the same way running the same type of processes that write/read to the database. Server 2 filled up pg_xlog and crashed. When it came back we began to experience slow query performance. I ran an ANALYZE against the tables involved in the query, but for some reason the optimizer is still choosing a hash join instead of a nested loop join, which is killing performance. The query on Server 1 runs in 14 seconds and the same query on Server 2 runs in 15 minutes:

Server 1
                           -> HashAggregate (cost=501922.84..501922.85 rows=1 width=532)
                                 -> Nested Loop (cost=250961.41..501922.83 rows=1 width=532)
                                       Join Filter: (("outer".host)::text = ("inner".host)::text)
                                       -> HashAggregate (cost=250961.40..250961.41 rows=1 width=26)
                                             -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..250961.40 rows=1 width=26)
                                                   Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
                                       -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..250961.40 rows=1 width=42)
                                             Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

Server 2

                           -> HashAggregate (cost=1814101.48..1814129.36 rows=2230 width=532)
                                 -> Hash Join (cost=906978.28..1814079.18 rows=2230 width=532)
                                       Hash Cond: (("outer".host)::text = ("inner".host)::text)
                                       -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..906877.88 rows=40147 width=42)
                                             Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
                                       -> Hash (cost=906978.27..906978.27 rows=1 width=516)
                                             -> HashAggregate (cost=906978.25..906978.26 rows=1 width=26)
                                                   -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..906877.88 rows=40147 width=26)
                                                         Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

Besides ANALYZE, what else can I do / look at to figure out why the optimizer is making the choices it is on Server 2, now causing slow performance problems?

Thanks,

Keaton

Re: Server crashed and now experiencing slow running queries

От

"Peter Childs"

Дата:

05 декабря 2007 г., 07:54:36

On 05/12/2007, Keaton Adams <kadams@mxlogic.com> wrote:

We're running PostgreSQL 8.1.4 on RHEL. I'm running a vacuum analyze on the mxl_fs_size table to see if that shows anything.

-Keaton

On 12/4/07 10:50 PM, "Keaton Adams" <kadams@mxlogic.com> wrote:

We have two servers configured the same way running the same type of processes that write/read to the database. Server 2 filled up pg_xlog and crashed. When it came back we began to experience slow query performance. I ran an ANALYZE against the tables involved in the query, but for some reason the optimizer is still choosing a hash join instead of a nested loop join, which is killing performance. The query on Server 1 runs in 14 seconds and the same query on Server 2 runs in 15 minutes:

Server 1
                           -> HashAggregate (cost=501922.84..501922.85 rows=1 width=532)
                                 -> Nested Loop (cost=250961.41..501922.83 rows=1 width=532)
                                       Join Filter: (("outer".host)::text = ("inner".host)::text)
                                       -> HashAggregate (cost=250961.40..250961.41 rows=1 width=26)
                                             -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..250961.40 rows=1 width=26)
                                                   Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
                                       -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..250961.40 rows=1 width=42)
                                             Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

Server 2

                           -> HashAggregate (cost=1814101.48..1814129.36 rows=2230 width=532)
                                 -> Hash Join (cost=906978.28..1814079.18 rows=2230 width=532)
                                       Hash Cond: (("outer".host)::text = ("inner".host)::text)
                                       -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..906877.88 rows=40147 width=42)
                                             Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
                                       -> Hash (cost=906978.27..906978.27 rows=1 width=516)
                                             -> HashAggregate (cost=906978.25..906978.26 rows=1 width=26)
                                                   -> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..906877.88 rows=40147 width=26)
                                                         Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

Besides ANALYZE, what else can I do / look at to figure out why the optimizer is making the choices it is on Server 2, now causing slow performance problems?

Look at table pg_stats eg

select * from pg_stats where tablename='msl_fs_size';

also read, http://www.postgresql.org/docs/8.2/interactive/planner-stats.html and
http://www.postgresql.org/docs/8.2/interactive/planner-stats-details.html

Can be a little hard going but it might help. On the other hand it might just baffle you with science.

Peter

Thanks,

Keaton

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Server crashed and now experiencing slow running queries

Server crashed and now experiencing slow running queries

Re: Server crashed and now experiencing slow running queries

Re: Server crashed and now experiencing slow running queries