Re: BUG #14107: Major query planner bug regarding subqueries and indices

Поиск
Список
Период
Сортировка
От Mathias Kunter
Тема Re: BUG #14107: Major query planner bug regarding subqueries and indices
Дата
Msg-id 69e376c7-6c2c-c96d-2ad6-3734affb59b7@gmail.com
обсуждение исходный текст
Ответ на Re: BUG #14107: Major query planner bug regarding subqueries and indices  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: BUG #14107: Major query planner bug regarding subqueries and indices  (Victor Yegorov <vyegorov@gmail.com>)
Список pgsql-bugs
> Thanks for the report, but I think your subject line is a little
> exaggerated.

Sorry for that :-)

> I'd not actually thought about expanding this to IN and NOT IN, but
> likely it would be possible, providing NULLs could be handled
> correctly too. Was this the optimisation you think is missing?

No - let's consider a real-world example where the problem becomes more
evident. Assume we have the following two tables:

CREATE TABLE book (id SERIAL NOT NULL, name VARCHAR, authorId SERIAL,
CONSTRAINT book_pkey PRIMARY KEY (id));

CREATE TABLE author (id SERIAL NOT NULL, surname VARCHAR, CONSTRAINT
author_pkey PRIMARY KEY (id));

All relevant columns are indexed. The book table contains 1.3 M rows.

Now let's say we want to find books by either name or by author name. A
simple problem, actually. The following query is used (which could of
course also be rewritten to use a JOIN instead, but that's not the point
here). We find out that it has an unexpected and unacceptably long
execution time:


EXPLAIN SELECT * FROM book WHERE name = 'Harry Potter' OR authorId IN
(SELECT id FROM author WHERE surname = 'Rowling');
                                         QUERY PLAN
-------------------------------------------------------------------------------------------
  Seq Scan on book  (cost=13.68..26792.88 rows=576709 width=40)
    Filter: (((name)::text = 'Harry Potter'::text) OR (hashed SubPlan 1))
    SubPlan 1
      ->  Bitmap Heap Scan on author  (cost=4.20..13.67 rows=6 width=4)
            Recheck Cond: ((surname)::text = 'Rowling'::text)
            ->  Bitmap Index Scan on author_surname_index
(cost=0.00..4.20 rows=6 width=0)
                  Index Cond: ((surname)::text = 'Rowling'::text)


In contrast to that, consider the query plan for the following logically
equivalent query, which executes in just a few milliseconds (!):

EXPLAIN SELECT * FROM book WHERE name = 'Harry Potter' UNION SELECT *
FROM book WHERE authorId IN (SELECT id FROM author WHERE surname =
'Rowling');
                                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=454.87..455.99 rows=112 width=29)
    Group Key: book.id, book.name, book.authorid
    ->  Append  (cost=0.43..454.03 rows=112 width=29)
          ->  Index Scan using book_name_index on book
(cost=0.43..16.48 rows=3 width=29)
                Index Cond: ((name)::text = 'Harry Potter'::text)
          ->  Nested Loop  (cost=4.63..436.43 rows=109 width=29)
                ->  Bitmap Heap Scan on author  (cost=4.20..13.67 rows=6
width=4)
                      Recheck Cond: ((surname)::text = 'Rowling'::text)
                      ->  Bitmap Index Scan on author_surname_index
(cost=0.00..4.20 rows=6 width=0)
                            Index Cond: ((surname)::text = 'Rowling'::text)
                ->  Index Scan using book_authorid_index on book book_1
(cost=0.43..70.28 rows=18 width=29)
                      Index Cond: (authorid = author.id)

В списке pgsql-bugs по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: BUG #14107: Major query planner bug regarding subqueries and indices
Следующее
От: Mathias Kunter
Дата:
Сообщение: Re: BUG #14107: Major query planner bug regarding subqueries and indices