Обсуждение: Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3
Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3
От
willmington@gmx.net (R. Willmington)
Дата:
Ladies and gentlemen, I have a strange problem with index usage: We have two identical database layouts in two different databases in the same db server: Postgres 7.3.2 Server Database 1 Database 2 |- cms_contents |- cms_contents |- cms_log |- cms_log Nedless to say the tables have the same indices. When running the following query, the planner decides to use all suitable table indices in database 1, but for the tables in database 2 it doesn't. This results in the query for database 1 taking about 1 second, whereas for database 2 the query takes about 4 minutes (!). SELECT a.id, b.log_timestamp as timestamp FROM cms_contents a, cms_log b WHERE a.state_id = 'd' AND a.parent_id NOT IN ( SELECT id FROM cms_contents WHERE state_id = 'd' AND language='de' and domain_id = 1 ) AND a.domain_id = 1 AND a.language = 'de' AND a.id = b.content_id AND b.log_action = 'deleteContent' AND b.log_timestamp = ( SELECT max(log_timestamp) FROM cms_log WHERE content_id = a.id AND b.domain_id = a.domain_id AND b.language = a.language ) ORDER BY timestamp I have already analyzed, reindexed, vacuumed (FULL...) the tables and both databases with no results. Indices in database 2 exist and are not broken, they are used for simple querys such as "select * from cms_log where log_action ='XYZ'". I have even restarted the postgres server to exclude any "opened - curser" based problems. Here is what i get when doing an expain for the big select query: _______________________________________ Database 1 (it is obviously using the indices) _______________________________________ Sort (cost=36615.14..36615.14 rows=1 width=36) Sort Key: b.log_timestamp -> Nested Loop (cost=0.00..36615.13 rows=1 width=36) Join Filter: ("outer".log_timestamp = (subplan)) -> Index Scan using cms_log_log_action on cms_log b (cost=0.00..451.79 rows=120 width=22) Index Cond: (log_action = 'deleteContent'::character varying) -> Index Scan using pk__cms_contents on cms_contents a (cost=0.00..213.55 rows=1 width=14) Index Cond: ((a.id = "outer".content_id) AND (a.domain_id = 1) AND (a."language" = 'de'::bpchar)) Filter: ((state_id = 'd'::bpchar) AND (subplan)) SubPlan -> Materialize (cost=207.58..207.58 rows=281 width=4) -> Index Scan using cms_contents_state_id on cms_contents (cost=0.00..207.58 rows=281 width=4) Index Cond: (state_id = 'd'::bpchar) Filter: (("language" = 'de'::bpchar) AND (domain_id = 1)) SubPlan -> Aggregate (cost=86.80..86.80 rows=1 width=8) -> Result (cost=0.00..86.73 rows=27 width=8) One-Time Filter: (($1 = $2) AND ($3 = $4)) -> Index Scan using cms_log_content_id on cms_log (cost=0.00..86.73 rows=27 width=8) Index Cond: (content_id = $0) _______________________________________ Database 2 (it is obviously not using all indices) _______________________________________ Sort (cost=337657.43..337657.43 rows=1 width=36) Sort Key: b.log_timestamp -> Nested Loop (cost=0.00..337657.42 rows=1 width=36) Join Filter: ("outer".log_timestamp = (subplan)) -> Seq Scan on cms_log b (cost=0.00..2492.53 rows=1491 width=22) Filter: (log_action = 'deleteContent'::character varying) -> Index Scan using pk__cms_contents on cms_contents a (cost=0.00..172.78 rows=1 width=14) Index Cond: ((a.id = "outer".content_id) AND (a.domain_id = 1) AND (a."language" = 'de'::bpchar)) Filter: ((state_id = 'd'::bpchar) AND (subplan)) SubPlan -> Seq Scan on cms_contents (cost=0.00..333.94 rows=2347 width=4) Filter: ((state_id = 'd'::bpchar) AND ("language" = 'de'::bpchar) AND (domain_id = 1)) SubPlan -> Aggregate (cost=51.95..51.95 rows=1 width=8) -> Result (cost=0.00..51.89 rows=24 width=8) One-Time Filter: (($1 = $2) AND ($3 = $4)) -> Index Scan using cms_log_content_id on cms_log (cost=0.00..51.89 rows=24 width=8) Index Cond: (content_id = $0) _____________________________________________________________ It seems to me that the query planner has some sort of bug when predicting the costs for the query. If i set SET ENABLE_SEQSCAN = NO; before running the query in database 2, this is what i get from the planner: ______________________________________________________________ Sort (cost=826278.50..826278.51 rows=1 width=36) Sort Key: b.log_timestamp -> Nested Loop (cost=0.00..826278.49 rows=1 width=36) Join Filter: ("outer".log_timestamp = (subplan)) -> Index Scan using cms_log_log_action on cms_log b (cost=0.00..3539.29 rows=1491 width=22) Index Cond: (log_action = 'deleteContent'::character varying) -> Index Scan using pk__cms_contents on cms_contents a (cost=0.00..499.72 rows=1 width=14) Index Cond: ((a.id = "outer".content_id) AND (a.domain_id = 1) AND (a."language" = 'de'::bpchar)) Filter: ((state_id = 'd'::bpchar) AND (subplan)) SubPlan -> Materialize (cost=493.92..493.92 rows=2347 width=4) -> Index Scan using cms_contents_state_id on cms_contents (cost=0.00..493.92 rows=2347 width=4) Index Cond: (state_id = 'd'::bpchar) Filter: (("language" = 'de'::bpchar) AND (domain_id = 1)) SubPlan -> Aggregate (cost=51.95..51.95 rows=1 width=8) -> Result (cost=0.00..51.89 rows=24 width=8) One-Time Filter: (($1 = $2) AND ($3 = $4)) -> Index Scan using cms_log_content_id on cms_log (cost=0.00..51.89 rows=24 width=8) Index Cond: (content_id = $0) ______________________________________________________ So the planner predicts that the costs for the select with index usage are about 3 times higher than without index usage, whereas in fact it is a lot faster. How can i improve the predicion? Am i to change the query? Here are the table layouts: ______________________________________________ cms_contents: ______________________________________________ Column | Type | Modifiers -----------------------+------------------------+------------------------------- id | integer | not null domain_id | integer | not null language | character(2) | not null state_id | character(1) | not null type_name | character varying(50) | not null parent_id | integer | not null sort_children_by | character varying(100) | not null default 'order_hint' show_children_in_tree | bit(1) | not null default B'1'::"bit" is_online | bit(1) | Indexes: pk__cms_contents primary key btree (id, domain_id, "language"), cms_contents_children_index btree (domain_id, parent_id, "language", state_id, is_online), cms_contents_state_id btree (state_id) ______________________________________________ cms_log: ______________________________________________ Column | Type | Modifiers ---------------+-----------------------------+------------------------ content_id | integer | not null domain_id | integer | not null language | character(2) | not null log_timestamp | timestamp without time zone | not null default now() log_user | character varying(50) | not null log_action | character varying(50) | not null Indexes: cms_log_content_id btree (content_id), cms_log_content_id_domain_action_language btree (content_id, domain_id, "language", log_action), cms_log_log_action btree (log_action), cms_log_log_timestamp btree (log_timestamp) Here are the table statistics: database 1: cms_contents: 13109 entries cms_log: 119166 entries database 2: cms_contents: 10436 entries cms_log: 105922 entries Any advice appreciated, Kind Regards, R. Willmington