Обсуждение: Server crashed and now experiencing slow running queries
We have two servers configured the same way running the same type of processes that write/read to the database. Server 2 filled up pg_xlog and crashed. When it came back we began to experience slow query performance. I ran an ANALYZE against the tables involved in the query, but for some reason the optimizer is still choosing a hash join instead of a nested loop join, which is killing performance. The query on Server 1 runs in 14 seconds and the same query on Server 2 runs in 15 minutes:
Server 1
-> HashAggregate (cost=501922.84..501922.85 rows=1 width=532)
-> Nested Loop (cost=250961.41..501922.83 rows=1 width=532)
Join Filter: (("outer".host)::text = ("inner".host)::text)
-> HashAggregate (cost=250961.40..250961.41 rows=1 width=26)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..250961.40 rows=1 width=26)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..250961.40 rows=1 width=42)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
Server 2
-> HashAggregate (cost=1814101.48..1814129.36 rows=2230 width=532)
-> Hash Join (cost=906978.28..1814079.18 rows=2230 width=532)
Hash Cond: (("outer".host)::text = ("inner".host)::text)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..906877.88 rows=40147 width=42)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
-> Hash (cost=906978.27..906978.27 rows=1 width=516)
-> HashAggregate (cost=906978.25..906978.26 rows=1 width=26)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..906877.88 rows=40147 width=26)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
Besides ANALYZE, what else can I do / look at to figure out why the optimizer is making the choices it is on Server 2, now causing slow performance problems?
Thanks,
Keaton
We’re running PostgreSQL 8.1.4 on RHEL. I’m running a vacuum analyze on the mxl_fs_size table to see if that shows anything.
-Keaton
On 12/4/07 10:50 PM, "Keaton Adams" <kadams@mxlogic.com> wrote:
We have two servers configured the same way running the same type of processes that write/read to the database. Server 2 filled up pg_xlog and crashed. When it came back we began to experience slow query performance. I ran an ANALYZE against the tables involved in the query, but for some reason the optimizer is still choosing a hash join instead of a nested loop join, which is killing performance. The query on Server 1 runs in 14 seconds and the same query on Server 2 runs in 15 minutes:
Server 1
-> HashAggregate (cost=501922.84..501922.85 rows=1 width=532)
-> Nested Loop (cost=250961.41..501922.83 rows=1 width=532)
Join Filter: (("outer".host)::text = ("inner".host)::text)
-> HashAggregate (cost=250961.40..250961.41 rows=1 width=26)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..250961.40 rows=1 width=26)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..250961.40 rows=1 width=42)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
Server 2
-> HashAggregate (cost=1814101.48..1814129.36 rows=2230 width=532)
-> Hash Join (cost=906978.28..1814079.18 rows=2230 width=532)
Hash Cond: (("outer".host)::text = ("inner".host)::text)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..906877.88 rows=40147 width=42)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
-> Hash (cost=906978.27..906978.27 rows=1 width=516)
-> HashAggregate (cost=906978.25..906978.26 rows=1 width=26)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..906877.88 rows=40147 width=26)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
Besides ANALYZE, what else can I do / look at to figure out why the optimizer is making the choices it is on Server 2, now causing slow performance problems?
Thanks,
Keaton
On 05/12/2007, Keaton Adams <kadams@mxlogic.com> wrote:
Look at table pg_stats eg
select * from pg_stats where tablename='msl_fs_size';
also read, http://www.postgresql.org/docs/8.2/interactive/planner-stats.html and
http://www.postgresql.org/docs/8.2/interactive/planner-stats-details.html
Can be a little hard going but it might help. On the other hand it might just baffle you with science.
Peter
We're running PostgreSQL 8.1.4 on RHEL. I'm running a vacuum analyze on the mxl_fs_size table to see if that shows anything.
-Keaton
On 12/4/07 10:50 PM, "Keaton Adams" <kadams@mxlogic.com> wrote:
We have two servers configured the same way running the same type of processes that write/read to the database. Server 2 filled up pg_xlog and crashed. When it came back we began to experience slow query performance. I ran an ANALYZE against the tables involved in the query, but for some reason the optimizer is still choosing a hash join instead of a nested loop join, which is killing performance. The query on Server 1 runs in 14 seconds and the same query on Server 2 runs in 15 minutes:
Server 1
-> HashAggregate (cost=501922.84..501922.85 rows=1 width=532)
-> Nested Loop (cost=250961.41..501922.83 rows=1 width=532)
Join Filter: (("outer".host)::text = ("inner".host)::text)
-> HashAggregate (cost=250961.40..250961.41 rows=1 width=26)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..250961.40 rows=1 width=26)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..250961.40 rows=1 width=42)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
Server 2
-> HashAggregate (cost=1814101.48..1814129.36 rows=2230 width=532)
-> Hash Join (cost=906978.28..1814079.18 rows=2230 width=532)
Hash Cond: (("outer".host)::text = ("inner".host)::text)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh (cost=0.01..906877.88 rows=40147 width=42)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
-> Hash (cost=906978.27..906978.27 rows=1 width=516)
-> HashAggregate (cost=906978.25..906978.26 rows=1 width=26)
-> Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size (cost=0.01..906877.88 rows=40147 width=26)
Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))
Besides ANALYZE, what else can I do / look at to figure out why the optimizer is making the choices it is on Server 2, now causing slow performance problems?
Look at table pg_stats eg
select * from pg_stats where tablename='msl_fs_size';
also read, http://www.postgresql.org/docs/8.2/interactive/planner-stats.html and
http://www.postgresql.org/docs/8.2/interactive/planner-stats-details.html
Can be a little hard going but it might help. On the other hand it might just baffle you with science.
Peter
Thanks,
Keaton