Обсуждение: Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3

Поиск
Список
Период
Сортировка

Query planner fails to calculate index usage - Postgres 7.3.2 on RedHat 7.3

От
willmington@gmx.net (R. Willmington)
Дата:
Ladies and gentlemen,

I have a strange problem with index usage:

We have two identical database layouts in two different databases
in the same db server:

                 Postgres 7.3.2 Server

           Database 1              Database 2
          |- cms_contents          |- cms_contents
          |- cms_log               |- cms_log

Nedless to say the tables have the same indices.

When running the following query, the planner
decides to use all suitable table indices in database 1, but for the
tables in database 2 it doesn't. This results in the query for
database 1 taking about 1 second, whereas for database 2 the query
takes about 4 minutes (!).

SELECT a.id, b.log_timestamp as timestamp
       FROM cms_contents a, cms_log b
       WHERE a.state_id = 'd'
             AND a.parent_id NOT IN (
                SELECT id FROM cms_contents
                 WHERE state_id = 'd'
                 AND language='de' and domain_id = 1
                )
             AND a.domain_id = 1
             AND a.language = 'de'
             AND a.id = b.content_id
             AND b.log_action = 'deleteContent'
             AND b.log_timestamp = (
                   SELECT max(log_timestamp)
                    FROM cms_log
                    WHERE content_id = a.id
                    AND b.domain_id = a.domain_id
                    AND b.language = a.language
                   )
             ORDER BY timestamp


I have already analyzed, reindexed, vacuumed (FULL...) the tables and
both
databases with no results. Indices in database 2 exist and are not
broken, they are used for simple querys such as "select * from cms_log
where log_action ='XYZ'".
I have even restarted the postgres server to exclude any "opened -
curser" based problems.

Here is what i get when doing an expain for the big select query:

_______________________________________

Database 1 (it is obviously using the indices)
_______________________________________


 Sort  (cost=36615.14..36615.14 rows=1 width=36)
   Sort Key: b.log_timestamp
   ->  Nested Loop  (cost=0.00..36615.13 rows=1 width=36)
         Join Filter: ("outer".log_timestamp = (subplan))
         ->  Index Scan using cms_log_log_action on cms_log b
              (cost=0.00..451.79 rows=120 width=22)
               Index Cond: (log_action = 'deleteContent'::character
varying)
         ->  Index Scan using pk__cms_contents on cms_contents a
              (cost=0.00..213.55 rows=1 width=14)
               Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1)
               AND (a."language" = 'de'::bpchar))
               Filter: ((state_id = 'd'::bpchar) AND (subplan))
               SubPlan
                 ->  Materialize  (cost=207.58..207.58 rows=281
width=4)
                       ->  Index Scan using cms_contents_state_id on
                       cms_contents  (cost=0.00..207.58 rows=281
width=4)
                             Index Cond: (state_id = 'd'::bpchar)
                             Filter: (("language" = 'de'::bpchar) AND
(domain_id
                                       = 1))
         SubPlan
           ->  Aggregate  (cost=86.80..86.80 rows=1 width=8)
                 ->  Result  (cost=0.00..86.73 rows=27 width=8)
                       One-Time Filter: (($1 = $2) AND ($3 = $4))
                       ->  Index Scan using cms_log_content_id on
cms_log
                            (cost=0.00..86.73 rows=27 width=8)
                             Index Cond: (content_id = $0)



_______________________________________

Database 2 (it is obviously not using all indices)
_______________________________________

 Sort  (cost=337657.43..337657.43 rows=1 width=36)
   Sort Key: b.log_timestamp
   ->  Nested Loop  (cost=0.00..337657.42 rows=1 width=36)
         Join Filter: ("outer".log_timestamp = (subplan))
         ->  Seq Scan on cms_log b  (cost=0.00..2492.53 rows=1491
width=22)
               Filter: (log_action = 'deleteContent'::character
varying)
         ->  Index Scan using pk__cms_contents on cms_contents a
(cost=0.00..172.78 rows=1 width=14)
               Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1) AND (a."language" = 'de'::bpchar))
               Filter: ((state_id = 'd'::bpchar) AND (subplan))
               SubPlan
                 ->  Seq Scan on cms_contents  (cost=0.00..333.94
rows=2347 width=4)
                       Filter: ((state_id = 'd'::bpchar) AND
("language" = 'de'::bpchar) AND (domain_id = 1))
         SubPlan
           ->  Aggregate  (cost=51.95..51.95 rows=1 width=8)
                 ->  Result  (cost=0.00..51.89 rows=24 width=8)
                       One-Time Filter: (($1 = $2) AND ($3 = $4))
                       ->  Index Scan using cms_log_content_id on
cms_log  (cost=0.00..51.89 rows=24 width=8)
                             Index Cond: (content_id = $0)


_____________________________________________________________

It seems to me that the query planner has some sort of bug when
predicting the costs for the query.

If i set SET ENABLE_SEQSCAN = NO; before running the query in database
2,
this is what i get from the planner:

______________________________________________________________

 Sort  (cost=826278.50..826278.51 rows=1 width=36)
   Sort Key: b.log_timestamp
   ->  Nested Loop  (cost=0.00..826278.49 rows=1 width=36)
         Join Filter: ("outer".log_timestamp = (subplan))
         ->  Index Scan using cms_log_log_action on cms_log b
(cost=0.00..3539.29 rows=1491 width=22)
               Index Cond: (log_action = 'deleteContent'::character
varying)
         ->  Index Scan using pk__cms_contents on cms_contents a
(cost=0.00..499.72 rows=1 width=14)
               Index Cond: ((a.id = "outer".content_id) AND
(a.domain_id = 1) AND (a."language" = 'de'::bpchar))
               Filter: ((state_id = 'd'::bpchar) AND (subplan))
               SubPlan
                 ->  Materialize  (cost=493.92..493.92 rows=2347
width=4)
                       ->  Index Scan using cms_contents_state_id on
cms_contents  (cost=0.00..493.92 rows=2347 width=4)
                             Index Cond: (state_id = 'd'::bpchar)
                             Filter: (("language" = 'de'::bpchar) AND
(domain_id = 1))
         SubPlan
           ->  Aggregate  (cost=51.95..51.95 rows=1 width=8)
                 ->  Result  (cost=0.00..51.89 rows=24 width=8)
                       One-Time Filter: (($1 = $2) AND ($3 = $4))
                       ->  Index Scan using cms_log_content_id on
cms_log  (cost=0.00..51.89 rows=24 width=8)
                             Index Cond: (content_id = $0)


______________________________________________________

So the planner predicts that the costs for the select with index usage
are about 3 times higher than without index usage, whereas in fact it
is a lot faster.

How can i improve the predicion? Am i to change the query?


Here are the table layouts:

______________________________________________

cms_contents:
______________________________________________


        Column         |          Type          |           Modifiers
-----------------------+------------------------+-------------------------------
 id                    | integer                | not null
 domain_id             | integer                | not null
 language              | character(2)           | not null
 state_id              | character(1)           | not null
 type_name             | character varying(50)  | not null
 parent_id             | integer                | not null
 sort_children_by      | character varying(100) | not null default
'order_hint'
 show_children_in_tree | bit(1)                 | not null default
B'1'::"bit"
 is_online             | bit(1)                 |

Indexes: pk__cms_contents primary key btree (id, domain_id,
"language"),
         cms_contents_children_index btree (domain_id, parent_id,
"language", state_id, is_online),
         cms_contents_state_id btree (state_id)



______________________________________________

cms_log:
______________________________________________


    Column     |            Type             |       Modifiers
---------------+-----------------------------+------------------------
 content_id    | integer                     | not null
 domain_id     | integer                     | not null
 language      | character(2)                | not null
 log_timestamp | timestamp without time zone | not null default now()
 log_user      | character varying(50)       | not null
 log_action    | character varying(50)       | not null
Indexes: cms_log_content_id btree (content_id),
         cms_log_content_id_domain_action_language btree (content_id,
domain_id, "language", log_action),
         cms_log_log_action btree (log_action),
         cms_log_log_timestamp btree (log_timestamp)



Here are the table statistics:

database 1:
  cms_contents: 13109 entries
  cms_log: 119166 entries

database 2:
  cms_contents: 10436 entries
  cms_log: 105922 entries


Any advice appreciated,

Kind Regards,
R. Willmington