[GENERAL] Equivalence Classes when using IN

Поиск
Список
Период
Сортировка
От Kim Rose Carlsen
Тема [GENERAL] Equivalence Classes when using IN
Дата
Msg-id VI1PR05MB1709E0E0ADE98AFC0D8D2177C7770@VI1PR05MB1709.eurprd05.prod.outlook.com
обсуждение исходный текст
Ответы Re: [GENERAL] Equivalence Classes when using IN  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-general
Hi

I have this query where I think it's strange that the join doesn't pull the where condition in since RHS is equal to LHS. It might be easier to expain with an example

Setup
            CREATE TABLE customer (
              customer_id INTEGER PRIMARY KEY
            );

            CREATE TABLE product (
              product_id  INTEGER PRIMARY KEY,
              customer_id INTEGER NOT NULL REFERENCES customer (customer_id)
            );
            
            INSERT INTO customer (SELECT generate_series FROM generate_series(0, 1000000));

            INSERT INTO product (product_id, customer_id) (SELECT generate_series, generate_series / 2 FROM generate_series(0, 2000));


Query

           EXPLAIN ANALYSE
            SELECT *
              FROM customer c
              JOIN (SELECT DISTINCT ON (customer_id) * FROM product ORDER BY customer_id, product_id) p
                ON c.customer_id = p.customer_id
             WHERE c.customer_id IN (500, 501);

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=172.43..186.25 rows=1 width=12) (actual time=1.350..1.353 rows=2 loops=1)
   Merge Cond: (c.customer_id = product.customer_id)
   ->  Sort  (cost=13.93..13.93 rows=2 width=4) (actual time=0.036..0.036 rows=2 loops=1)
         Sort Key: c.customer_id
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on customer c  (cost=8.58..13.92 rows=2 width=4) (actual time=0.026..0.027 rows=2 loops=1)
               Recheck Cond: (customer_id = ANY ('{500,501}'::integer[]))
               Heap Blocks: exact=1
               ->  Bitmap Index Scan on customer_pkey  (cost=0.00..8.58 rows=2 width=0) (actual time=0.018..0.018 rows=2 loops=1)
                     Index Cond: (customer_id = ANY ('{500,501}'::integer[]))
   ->  Unique  (cost=158.51..169.81 rows=200 width=8) (actual time=0.783..1.221 rows=502 loops=1)
         ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual time=0.782..0.929 rows=1003 loops=1)
               Sort Key: product.customer_id, product.product_id
               Sort Method: quicksort  Memory: 142kB
               ->  Seq Scan on product  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.015..0.366 rows=2001 loops=1)
 Planning time: 0.281 ms
 Execution time: 1.432 ms

I would expect that since c.customer_id = p.customer_id then p.customer_id IN (500, 501). If I apply this rule myself, I get a much nicer plan (and it could be even better with an index on product_id).


                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=52.70..53.11 rows=1 width=12) (actual time=0.686..0.693 rows=2 loops=1)
   Merge Cond: (product.customer_id = c.customer_id)
   ->  Unique  (cost=38.77..38.89 rows=22 width=8) (actual time=0.647..0.651 rows=2 loops=1)
         ->  Sort  (cost=38.77..38.83 rows=23 width=8) (actual time=0.646..0.647 rows=4 loops=1)
               Sort Key: product.customer_id, product.product_id
               Sort Method: quicksort  Memory: 25kB
               ->  Seq Scan on product  (cost=0.00..38.25 rows=23 width=8) (actual time=0.331..0.632 rows=4 loops=1)
                     Filter: (customer_id = ANY ('{500,501}'::integer[]))
                     Rows Removed by Filter: 1997
   ->  Sort  (cost=13.93..13.93 rows=2 width=4) (actual time=0.033..0.033 rows=2 loops=1)
         Sort Key: c.customer_id
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on customer c  (cost=8.58..13.92 rows=2 width=4) (actual time=0.025..0.026 rows=2 loops=1)
               Recheck Cond: (customer_id = ANY ('{500,501}'::integer[]))
               Heap Blocks: exact=1
               ->  Bitmap Index Scan on customer_pkey  (cost=0.00..8.58 rows=2 width=0) (actual time=0.018..0.018 rows=2 loops=1)
                     Index Cond: (customer_id = ANY ('{500,501}'::integer[]))
 Planning time: 0.386 ms
 Execution time: 0.774 ms
(19 rows)

Is this because postgres never consider IN clause when building equivalence class's?

Are there any interests in adding such rule?


My idea is to wrap this in a view

          CREATE VIEW view_customer AS
            SELECT c.customer_id,
                   p.product_id
              FROM customer c
         LEFT JOIN (SELECT DISTINCT ON (customer_id) * FROM product ORDER BY customer_id, product_id) p
                ON c.customer_id = p.customer_id

Where the LEFT JOIN can be pruned if there is no explicit need for product_id. Here I loose the power to express that both c.customer_id and p.customer_id is the same.

Best regards

Kim Carlsen

В списке pgsql-general по дате отправления:

Предыдущее
От: Melvin Davidson
Дата:
Сообщение: Re: [GENERAL] delete a file everytime pg server starts/crashes
Следующее
От: David Rowley
Дата:
Сообщение: Re: [GENERAL] Equivalence Classes when using IN