Обсуждение: Plan for in with one argument

Поиск
Список
Период
Сортировка

Plan for in with one argument

От
Marcus Engene
Дата:
Hi List,

With automated queries where I have COLUMN IN (), I get a different plan
from COLUMN = (). That would make sense if there are several arguments,
but in the case with just one argument the penalty can be seveare. One
query went from 5s execution time to a few houndreds of mS when I
changed IN to = if the number of arguments is 1.

Is there a technical reason for not treating IN with one argument as =
in that case?

pondDump=> explain analyze  select
     pic.objectid as pic_objectid
from
     pond_item_common pic
where
     pic.pond_user IN (select pu2.objectid from pond_user pu2 where
username_locase IN ('iceberger'))
limit 100;
                                                                   QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=15.41..396.89 rows=100 width=4) (actual time=0.047..0.061
rows=11 loops=1)
    ->  Nested Loop  (cost=15.41..1400.19 rows=363 width=4) (actual
time=0.046..0.056 rows=11 loops=1)
          ->  HashAggregate  (cost=8.28..8.29 rows=1 width=4) (actual
time=0.026..0.026 rows=1 loops=1)
                ->  Index Scan using pond_user_c2 on pond_user pu2
(cost=0.00..8.27 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)
                      Index Cond: ((username_locase)::text =
'iceberger'::text)
          ->  Bitmap Heap Scan on pond_item_common pic
(cost=7.13..1387.36 rows=363 width=8) (actual time=0.015..0.024 rows=11
loops=1)
                Recheck Cond: (pic.pond_user = pu2.objectid)
                ->  Bitmap Index Scan on pond_item_common_x1
(cost=0.00..7.04 rows=363 width=0) (actual time=0.012..0.012 rows=11
loops=1)
                      Index Cond: (pic.pond_user = pu2.objectid)
  Total runtime: 0.181 ms
(10 rows)

pondDump=> explain analyze
select
     pic.objectid as pic_objectid
from
     pond_item_common pic
where
     pic.pond_user = (select pu2.objectid from pond_user pu2 where
username_locase IN ('iceberger'))
limit 100;
                                                              QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=15.41..395.88 rows=100 width=4) (actual time=0.043..0.055
rows=11 loops=1)
    InitPlan 1 (returns $0)
      ->  Index Scan using pond_user_c2 on pond_user pu2
(cost=0.00..8.27 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)
            Index Cond: ((username_locase)::text = 'iceberger'::text)
    ->  Bitmap Heap Scan on pond_item_common pic  (cost=7.13..1388.27
rows=363 width=4) (actual time=0.042..0.053 rows=11 loops=1)
          Recheck Cond: (pond_user = $0)
          ->  Bitmap Index Scan on pond_item_common_x1  (cost=0.00..7.04
rows=363 width=0) (actual time=0.038..0.038 rows=11 loops=1)
                Index Cond: (pond_user = $0)
  Total runtime: 0.096 ms
(9 rows)

pondDump=>

Best regards,
Marcus


Re: Plan for in with one argument

От
Alban Hertroys
Дата:
On 11 Jul 2010, at 11:38, Marcus Engene wrote:

> Hi List,
>
> With automated queries where I have COLUMN IN (), I get a different plan from COLUMN = (). That would make sense if
thereare several arguments, but in the case with just one argument the penalty can be seveare. One query went from 5s
executiontime to a few houndreds of mS when I changed IN to = if the number of arguments is 1. 
>
> Is there a technical reason for not treating IN with one argument as = in that case?

It does that already for constant IN-lists:
=> create table test (id serial PRIMARY KEY);
=> insert into test (id) SELECT nextval('test_id_seq') from generate_
series(1, 10000);
=> ANALYZE test;
=> explain analyse select * from test where id IN (15);
                                                   QUERY PLAN
--------------------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..8.27 rows=1 width=4) (actual time=0.024..0.029 rows=1 loops=1)
   Index Cond: (id = 15)
 Total runtime: 0.102 ms
(3 rows)


However, you're using a subquery to get the IN-list. I'm pretty sure the planner cannot know for certain that your
subquerywill return only one row, so it cannot substitute your IN(subquery) with =(subquery). 

You'd probably be better off using an EXISTS instead of an IN there, that should certainly help for cases where the
subqueryreturns many records, but it also gives the planner a better idea of your intentions. 

> pondDump=> explain analyze  select
>    pic.objectid as pic_objectid
> from
>    pond_item_common pic
> where
>    pic.pond_user IN (select pu2.objectid from pond_user pu2 where username_locase IN ('iceberger'))
> limit 100;
>                                                                  QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=15.41..396.89 rows=100 width=4) (actual time=0.047..0.061 rows=11 loops=1)
>   ->  Nested Loop  (cost=15.41..1400.19 rows=363 width=4) (actual time=0.046..0.056 rows=11 loops=1)
>         ->  HashAggregate  (cost=8.28..8.29 rows=1 width=4) (actual time=0.026..0.026 rows=1 loops=1)
>               ->  Index Scan using pond_user_c2 on pond_user pu2  (cost=0.00..8.27 rows=1 width=4) (actual
time=0.017..0.018rows=1 loops=1) 
>                     Index Cond: ((username_locase)::text = 'iceberger'::text)
>         ->  Bitmap Heap Scan on pond_item_common pic  (cost=7.13..1387.36 rows=363 width=8) (actual time=0.015..0.024
rows=11loops=1) 
>               Recheck Cond: (pic.pond_user = pu2.objectid)
>               ->  Bitmap Index Scan on pond_item_common_x1  (cost=0.00..7.04 rows=363 width=0) (actual
time=0.012..0.012rows=11 loops=1) 
>                     Index Cond: (pic.pond_user = pu2.objectid)
> Total runtime: 0.181 ms
> (10 rows)
>
> pondDump=> explain analyze
> select
>    pic.objectid as pic_objectid
> from
>    pond_item_common pic
> where
>    pic.pond_user = (select pu2.objectid from pond_user pu2 where username_locase IN ('iceberger'))
> limit 100;
>                                                             QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=15.41..395.88 rows=100 width=4) (actual time=0.043..0.055 rows=11 loops=1)
>   InitPlan 1 (returns $0)
>     ->  Index Scan using pond_user_c2 on pond_user pu2  (cost=0.00..8.27 rows=1 width=4) (actual time=0.017..0.018
rows=1loops=1) 
>           Index Cond: ((username_locase)::text = 'iceberger'::text)
>   ->  Bitmap Heap Scan on pond_item_common pic  (cost=7.13..1388.27 rows=363 width=4) (actual time=0.042..0.053
rows=11loops=1) 
>         Recheck Cond: (pond_user = $0)
>         ->  Bitmap Index Scan on pond_item_common_x1  (cost=0.00..7.04 rows=363 width=0) (actual time=0.038..0.038
rows=11loops=1) 
>               Index Cond: (pond_user = $0)
> Total runtime: 0.096 ms
> (9 rows)
>
> pondDump=>
>
> Best regards,
> Marcus
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c39a024286213416620622!



Re: Plan for in with one argument

От
Marcus Engene
Дата:
On 7/11/10 12:42 , Alban Hertroys wrote:
> On 11 Jul 2010, at 11:38, Marcus Engene wrote:
>
>
>> Hi List,
>>
>> With automated queries where I have COLUMN IN (), I get a different plan from COLUMN = (). That would make sense if
thereare several arguments, but in the case with just one argument the penalty can be seveare. One query went from 5s
executiontime to a few houndreds of mS when I changed IN to = if the number of arguments is 1. 
>>
>> Is there a technical reason for not treating IN with one argument as = in that case?
>>
> It does that already for constant IN-lists:
> =>  create table test (id serial PRIMARY KEY);
> =>  insert into test (id) SELECT nextval('test_id_seq') from generate_
> series(1, 10000);
> =>  ANALYZE test;
> =>  explain analyse select * from test where id IN (15);
>                                                     QUERY PLAN
> --------------------------------------------------------------------------------
>   Index Scan using test_pkey on test  (cost=0.00..8.27 rows=1 width=4) (actual time=0.024..0.029 rows=1 loops=1)
>     Index Cond: (id = 15)
>   Total runtime: 0.102 ms
> (3 rows)
>
>
> However, you're using a subquery to get the IN-list. I'm pretty sure the planner cannot know for certain that your
subquerywill return only one row, so it cannot substitute your IN(subquery) with =(subquery). 
>
> You'd probably be better off using an EXISTS instead of an IN there, that should certainly help for cases where the
subqueryreturns many records, but it also gives the planner a better idea of your intentions. 
>
>

Hi Alban,

This makes great sense both in theory and empiric tests. Thanks for the
explanation.

Best regards,
Marcus