Re: Performance difference between ANY and IN, also array syntax

Поиск
Список
Период
Сортировка
От Ron Mayer
Тема Re: Performance difference between ANY and IN, also array syntax
Дата
Msg-id 426F1AA5.4000709@cheapcomplexdevices.com
обсуждение исходный текст
Ответ на Re: Performance difference between ANY and IN, also array syntax  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Tom Lane wrote:
> Bart Grantham <bart@logicworks.net> writes:
>
>># SELECT * FROM connections  WHERE connectee_node_id = ANY (ARRAY[28543,28542] );
>>[ performance sucks ]
>
> Yeah :-(.  The = ANY (array) notation is new in the last release or so,
> and is completely without optimization of any kind.  Turn it into IN
> if you can.

You could also use the module in contrib/intagg to treat your array
as a table so you can JOIN against it like this:

   select * from connections
            join int_array_enum('{28543,28542}'::int[]) as iae(x)
              on (connectee_node_id = x);

This seems to work very well for me.  I show a real-life example
with a table about the same size as Bart's (about 200K rows) below.

It has the advantage of being able to do all the different join types
whenever it guesses it would be appropriate to do so.   I've never
seen "in" clauses use some of those tricks (though it might be possible
that in clauses can do that as well).  It seems to always think that
int_array_enum will return a 1000 row table (I wonder if there should
eventually be some way to tweak that), so when joining against very
large tables it happily does index scans - and when joining against
smaller tables it'll do merge joins or whatever.





=========================================================================
=========================================================================
=========================================================================
== an example going against a larger table,
== the int_array_enum trick reasonably chooses nested loops and index scans
=========================================================================
=========================================================================
=========================================================================
fli=# explain
  select *
    from lines
   where tlid = ANY ('{125060436,125060437}'::int[]);
                            QUERY PLAN
----------------------------------------------------------------
  Seq Scan on lines  (cost=0.00..26636.03 rows=254229 width=102)
    Filter: (tlid = ANY ('{125060436,125060437}'::integer[]))
(2 rows)

fli=# explain
select *
   from lines
   join int_array_enum('{125060436,125060437}'::int[]) as iae(x)
     on (tlid=x);
fli-# fli-# fli-# fli-#                                       QUERY PLAN
--------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..4228.99 rows=1000 width=106)
    ->  Function Scan on int_array_enum iae  (cost=0.00..12.50 rows=1000 width=4)
    ->  Index Scan using rtgr_lines__tlid on lines  (cost=0.00..4.20 rows=1 width=102)
          Index Cond: (lines.tlid = "outer".x)
(4 rows)


=========================================================================
=========================================================================
=========================================================================
== an example going against a smaller table
== the int_array_enum trick reasonably chooses a merge join.
=========================================================================
=========================================================================
=========================================================================
fli=# explain analyze
select *
   from rt6
   join int_array_enum('{125060436,125060437}'::int[]) as iae(x)
     on (tlid=x);
fli-# fli-# fli-# fli-#                                                             QUERY PLAN                \

--------------------------------------------------------------------------------------------------------------\
--------------------
  Merge Join  (cost=62.33..1226.41 rows=2848 width=82) (actual time=36.669..36.669 rows=0 loops=1)
    Merge Cond: ("outer".tlid = "inner".x)
    ->  Index Scan using rt6__tlid on rt6  (cost=0.00..1066.65 rows=21881 width=78) (actual time=0.020..21.982 \
rows=10544 loops=1)
    ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual time=0.042..0.046 rows=2 loops=1)
          Sort Key: iae.x
          ->  Function Scan on int_array_enum iae  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.020..0.0\
23 rows=2 loops=1)
  Total runtime: 36.770 ms
(7 rows)

======================================================

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Performance difference between ANY and IN, also array syntax
Следующее
От: Patrick.FICHE@AQSACOM.COM
Дата:
Сообщение: Re: Change Windows path to Unix path...