Re: BUG #11441: Weird (and seems wrong) behavior of partial indexes with order by/limit

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: BUG #11441: Weird (and seems wrong) behavior of partial indexes with order by/limit
Дата
Msg-id 20140919.171544.140857218.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: BUG #11441: Weird (and seems wrong) behavior of partial indexes with order by/limit  (Maxim Boguk <maxim.boguk@gmail.com>)
Ответы Re: BUG #11441: Weird (and seems wrong) behavior of partial indexes with order by/limit  (Maxim Boguk <maxim.boguk@gmail.com>)
Re: BUG #11441: Weird (and seems wrong) behavior of partial indexes with order by/limit  (David G Johnston <david.g.johnston@gmail.com>)
Список pgsql-bugs
Hello, I think this is a behavior as desinged.

An index scans having scalar array operator on a non-first index
column are considered as 'not ordered'. This is because scalar
array operator on non-first index columns breaks ordering of the
result tuples.

Btree is the the only index capable of accepting scalar-array
operators so far.

> =# select amname from pg_am where amsearcharray;
>  amname
> --------
>  btree
> (1 row)

If my understnding is correct, it repeats scanning the index
using non-array restrictions for every array element, or every
possible combination of elements of multiple scalar arrays, so
the index-order generally won't be preserved in the result
tuples.

The one obvious exception is the case of the scalar-array
operation on the first index column. The value in the array is
sorted before the iterations mentioned above, so the the planner
can determine it to be ordered *only* for this case.

The result could be ordered if the all restrictions on all index
columns before scalar-array-op column are equal conditions, but
the case is judged to be abandoned from the viewpoint of cost and
modularitly.


Therefore, the planner eliminates the sort for the following
example, even though no meaning in itself.

create table test as (select g.i as id, (random()*100)::integer as
is_finished from generate_series(1,1000000) as g(i));
create index test_2_key on test(is_finished, id) where is_finished = ANY
(ARRAY[0, 5]);
vacuum analyze test;

explain (costs off) select * from test where is_finished IN (0,5) order by is_finished, id limit 1;

                          QUERY PLAN
--------------------------------------------------------------
 Limit
   ->  Index Only Scan using test_2_key on test
         Index Cond: (is_finished = ANY ('{0,5}'::integer[]))


regards,

At Thu, 18 Sep 2014 21:34:07 +1000, Maxim Boguk <maxim.boguk@gmail.com> wrote in
<CAK-MWwS2=8iE=BV-vPORd-+PL76HZsgC9PVzydkAUgnXXntkyQ@mail.gmail.com>
> Some update now with full reproducible test case (and some surprising
> results):
>
> Test case initialization:
>
> drop table if exists test;
> create table test as (select g.i as id, (random()*100)::integer as
> is_finished from generate_series(1,1000000) as g(i));
> create index test_1_key on test(id, is_finished) where is_finished = ANY
> (ARRAY[0, 5]);
> vacuum analyze test;
>
> Good (but not expected in that case) plan:
>
> explain analyze select * from test where is_finished=0 or is_finished=5
> order by id limit 1;
>                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.24 rows=1 width=8) (actual time=0.052..0.052 rows=1
> loops=1)
>    ->  Index Only Scan using test_1_key on test  (cost=0.00..4493.05
> rows=18921 width=8) (actual time=0.052..0.052 rows=1 loops=1)
>          Heap Fetches: 1
>  Total runtime: 0.066 ms
> (i'm very surprised than the PostgreSQL managed deduct is_finished = ANY
> (ARRAY[0, 5]) from (is_finished=0 or is_finished=5))
>
>
>
> Bad plan (techically the same query and even better suitable for the
> partial index and should have the same plan but no luck):
>
> explain analyze select * from test where is_finished IN (0,5) order by id
> limit 1;
>                                                                   QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=4809.18..4809.19 rows=1 width=8) (actual time=15.410..15.410
> rows=1 loops=1)
>    ->  Sort  (cost=4809.18..4999.44 rows=19026 width=8) (actual
> time=15.408..15.408 rows=1 loops=1)
>          Sort Key: id
>          Sort Method: top-N heapsort  Memory: 25kB
>          ->  Index Only Scan using test_1_key on test  (cost=0.00..4428.66
> rows=19026 width=8) (actual time=0.051..12.277 rows=15222 loops=1)
>                Index Cond: (is_finished = ANY ('{0,5}'::integer[]))
>                Heap Fetches: 15222
>  Total runtime: 15.469 ms

--
Kyotaro Horiguchi
NTT Open Source Software Center

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #11350: ALTER SYSTEM is not DDL?
Следующее
От: Maxim Boguk
Дата:
Сообщение: Re: BUG #11441: Weird (and seems wrong) behavior of partial indexes with order by/limit