Обсуждение: Preferring index-only-scan when the cost is equal

Поиск
Список
Период
Сортировка

Preferring index-only-scan when the cost is equal

От
Yugo Nagata
Дата:
Hi,

I found that there is a situation that even when index only scan can be effective, 
the planner doesn't select this.  The planner makes indexe paths in descending
order of indexrelid, and the new path is discarded if its cost is not less than
the existing paths' cost. As a result, IndexOnlyScan path can be discard even 
hough it may be effective than normal IndexScan.

Here is a example;

=# create table test1 (i int, d int);
CREATE TABLE
=# create index on test1(i) include (d);
CREATE INDEX

=# explain select d from test1 where i = 0;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Index Only Scan using test1_i_d_idx on test1  (cost=0.15..36.35 rows=11 width=4)
   Index Cond: (i = 0)
(2 rows)

=# create index on test1(i) ;
CREATE INDEX
=# explain select d from test1 where i = 0;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Scan using test1_i_idx on test1  (cost=0.15..36.35 rows=11 width=4)
   Index Cond: (i = 0)
(2 rows)


This is not new for the covered index feature. We can see the same thing when using
multi-column indexes.


=# create table test2 (i int, d int);
CREATE TABLE
=# create index on test2(i,d);
CREATE INDEX
=# explain select d from test2 where i = 0;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Index Only Scan using test2_i_d_idx on test2  (cost=0.15..36.35 rows=11 width=4)
   Index Cond: (i = 0)
(2 rows)

=# create index on test2(i);
CREATE INDEX
=# explain select d from test2 where i = 0;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Scan using test2_i_idx on test2  (cost=0.15..36.35 rows=11 width=4)
   Index Cond: (i = 0)
(2 rows)


Attached is a patch to prefer index-only-scan when the cost is equal to other index
path.  Honestly, I'm not sure this is the best way. Any comments and advices would
be appriciated.

Regards,

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Preferring index-only-scan when the cost is equal

От
Ashutosh Bapat
Дата:
On Wed, Jul 11, 2018 at 11:03 AM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> Hi,
>
> I found that there is a situation that even when index only scan can be effective,
> the planner doesn't select this.  The planner makes indexe paths in descending
> order of indexrelid, and the new path is discarded if its cost is not less than
> the existing paths' cost. As a result, IndexOnlyScan path can be discard even
> hough it may be effective than normal IndexScan.
>
> Here is a example;
>
> =# create table test1 (i int, d int);
> CREATE TABLE
> =# create index on test1(i) include (d);
> CREATE INDEX
>
> =# explain select d from test1 where i = 0;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Index Only Scan using test1_i_d_idx on test1  (cost=0.15..36.35 rows=11 width=4)
>    Index Cond: (i = 0)
> (2 rows)
>
> =# create index on test1(i) ;
> CREATE INDEX
> =# explain select d from test1 where i = 0;
>                                 QUERY PLAN
> ---------------------------------------------------------------------------
>  Index Scan using test1_i_idx on test1  (cost=0.15..36.35 rows=11 width=4)
>    Index Cond: (i = 0)
> (2 rows)
>
>
> This is not new for the covered index feature. We can see the same thing when using
> multi-column indexes.
>
>
> =# create table test2 (i int, d int);
> CREATE TABLE
> =# create index on test2(i,d);
> CREATE INDEX
> =# explain select d from test2 where i = 0;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Index Only Scan using test2_i_d_idx on test2  (cost=0.15..36.35 rows=11 width=4)
>    Index Cond: (i = 0)
> (2 rows)
>
> =# create index on test2(i);
> CREATE INDEX
> =# explain select d from test2 where i = 0;
>                                 QUERY PLAN
> ---------------------------------------------------------------------------
>  Index Scan using test2_i_idx on test2  (cost=0.15..36.35 rows=11 width=4)
>    Index Cond: (i = 0)
> (2 rows)
>
>
> Attached is a patch to prefer index-only-scan when the cost is equal to other index
> path.  Honestly, I'm not sure this is the best way. Any comments and advices would
> be appriciated.
>

I don't think we should change add_path() for this. We will
unnecessarily check that condition even for the cases where we do not
create index paths. I think we should fix the caller of add_path()
instead to add index only path before any index paths. For that the
index list needs to be sorted by the possibility of using index only
scan.

But I think in your case, it might be better to first check whether
there is any costing error because of which index only scan's path has
the same cost as index scan path. Also I don't see any testcase which
will show why index only scan would be more efficient in your case.
May be provide output of EXPLAIN ANALYZE.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: Preferring index-only-scan when the cost is equal

От
Tomas Vondra
Дата:
On 07/11/2018 01:28 PM, Ashutosh Bapat wrote:
> On Wed, Jul 11, 2018 at 11:03 AM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
>> Hi,
>>
>> I found that there is a situation that even when index only scan can be effective,
>> the planner doesn't select this.  The planner makes indexe paths in descending
>> order of indexrelid, and the new path is discarded if its cost is not less than
>> the existing paths' cost. As a result, IndexOnlyScan path can be discard even
>> hough it may be effective than normal IndexScan.
>>
>> Here is a example;
>>
>> =# create table test1 (i int, d int);
>> CREATE TABLE
>> =# create index on test1(i) include (d);
>> CREATE INDEX
>>
>> =# explain select d from test1 where i = 0;
>>                                      QUERY PLAN
>> ----------------------------------------------------------------------------------
>>   Index Only Scan using test1_i_d_idx on test1  (cost=0.15..36.35 rows=11 width=4)
>>     Index Cond: (i = 0)
>> (2 rows)
>>
>> =# create index on test1(i) ;
>> CREATE INDEX
>> =# explain select d from test1 where i = 0;
>>                                  QUERY PLAN
>> ---------------------------------------------------------------------------
>>   Index Scan using test1_i_idx on test1  (cost=0.15..36.35 rows=11 width=4)
>>     Index Cond: (i = 0)
>> (2 rows)
>>
>>
>> This is not new for the covered index feature. We can see the same thing when using
>> multi-column indexes.
>>
>>
>> =# create table test2 (i int, d int);
>> CREATE TABLE
>> =# create index on test2(i,d);
>> CREATE INDEX
>> =# explain select d from test2 where i = 0;
>>                                      QUERY PLAN
>> ----------------------------------------------------------------------------------
>>   Index Only Scan using test2_i_d_idx on test2  (cost=0.15..36.35 rows=11 width=4)
>>     Index Cond: (i = 0)
>> (2 rows)
>>
>> =# create index on test2(i);
>> CREATE INDEX
>> =# explain select d from test2 where i = 0;
>>                                  QUERY PLAN
>> ---------------------------------------------------------------------------
>>   Index Scan using test2_i_idx on test2  (cost=0.15..36.35 rows=11 width=4)
>>     Index Cond: (i = 0)
>> (2 rows)
>>
>>
>> Attached is a patch to prefer index-only-scan when the cost is equal to other index
>> path.  Honestly, I'm not sure this is the best way. Any comments and advices would
>> be appriciated.
>>
> 
> I don't think we should change add_path() for this. We will
> unnecessarily check that condition even for the cases where we do not
> create index paths. I think we should fix the caller of add_path()
> instead to add index only path before any index paths. For that the
> index list needs to be sorted by the possibility of using index only
> scan.
> 
> But I think in your case, it might be better to first check whether
> there is any costing error because of which index only scan's path has
> the same cost as index scan path. Also I don't see any testcase which
> will show why index only scan would be more efficient in your case.
> May be provide output of EXPLAIN ANALYZE.
> 

I suspect this only happens due to testing on empty tables. Not only is 
testing of indexes on small tables rather pointless in general, but more 
importantly there will be no statistics. So we fall back to some default 
estimates, but we also don't have relallvisible etc which is crucial for 
estimating index-only scans. I'd bet that's why the cost estimates for 
index scans and index-only scans are the same here.

Yugo-san, have you observed this behavior on larger tables?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Preferring index-only-scan when the cost is equal

От
Yugo Nagata
Дата:
On Wed, 11 Jul 2018 14:37:46 +0200
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> 
> On 07/11/2018 01:28 PM, Ashutosh Bapat wrote:

> > I don't think we should change add_path() for this. We will
> > unnecessarily check that condition even for the cases where we do not
> > create index paths. I think we should fix the caller of add_path()
> > instead to add index only path before any index paths. For that the
> > index list needs to be sorted by the possibility of using index only
> > scan.
> > 
> > But I think in your case, it might be better to first check whether
> > there is any costing error because of which index only scan's path has
> > the same cost as index scan path. Also I don't see any testcase which
> > will show why index only scan would be more efficient in your case.
> > May be provide output of EXPLAIN ANALYZE.
> > 
> 
> I suspect this only happens due to testing on empty tables. Not only is 
> testing of indexes on small tables rather pointless in general, but more 
> importantly there will be no statistics. So we fall back to some default 
> estimates, but we also don't have relallvisible etc which is crucial for 
> estimating index-only scans. I'd bet that's why the cost estimates for 
> index scans and index-only scans are the same here.

You are right. When the table have rows and this is vacuumed, index only
scan's cost is cheaper and chosen properly. Sorry, I have jumped to the
conclusion before confirming this.

Thanks,

-- 
Yugo Nagata <nagata@sraoss.co.jp>


Re: Preferring index-only-scan when the cost is equal

От
Tomas Vondra
Дата:

On 07/12/2018 03:44 AM, Yugo Nagata wrote:
> On Wed, 11 Jul 2018 14:37:46 +0200
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> 
>>
>> On 07/11/2018 01:28 PM, Ashutosh Bapat wrote:
> 
>>> I don't think we should change add_path() for this. We will
>>> unnecessarily check that condition even for the cases where we do not
>>> create index paths. I think we should fix the caller of add_path()
>>> instead to add index only path before any index paths. For that the
>>> index list needs to be sorted by the possibility of using index only
>>> scan.
>>>
>>> But I think in your case, it might be better to first check whether
>>> there is any costing error because of which index only scan's path has
>>> the same cost as index scan path. Also I don't see any testcase which
>>> will show why index only scan would be more efficient in your case.
>>> May be provide output of EXPLAIN ANALYZE.
>>>
>>
>> I suspect this only happens due to testing on empty tables. Not only is
>> testing of indexes on small tables rather pointless in general, but more
>> importantly there will be no statistics. So we fall back to some default
>> estimates, but we also don't have relallvisible etc which is crucial for
>> estimating index-only scans. I'd bet that's why the cost estimates for
>> index scans and index-only scans are the same here.
> 
> You are right. When the table have rows and this is vacuumed, index only
> scan's cost is cheaper and chosen properly. Sorry, I have jumped to the
> conclusion before confirming this.
> 

I'm very experienced in this. I've done this mistake a million times ;-)

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Preferring index-only-scan when the cost is equal

От
Yugo Nagata
Дата:
On Thu, 12 Jul 2018 12:59:15 +0200
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> 
> 
> On 07/12/2018 03:44 AM, Yugo Nagata wrote:
> > On Wed, 11 Jul 2018 14:37:46 +0200
> > Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > 
> >>
> >> On 07/11/2018 01:28 PM, Ashutosh Bapat wrote:
> > 
> >>> I don't think we should change add_path() for this. We will
> >>> unnecessarily check that condition even for the cases where we do not
> >>> create index paths. I think we should fix the caller of add_path()
> >>> instead to add index only path before any index paths. For that the
> >>> index list needs to be sorted by the possibility of using index only
> >>> scan.
> >>>
> >>> But I think in your case, it might be better to first check whether
> >>> there is any costing error because of which index only scan's path has
> >>> the same cost as index scan path. Also I don't see any testcase which
> >>> will show why index only scan would be more efficient in your case.
> >>> May be provide output of EXPLAIN ANALYZE.
> >>>
> >>
> >> I suspect this only happens due to testing on empty tables. Not only is
> >> testing of indexes on small tables rather pointless in general, but more
> >> importantly there will be no statistics. So we fall back to some default
> >> estimates, but we also don't have relallvisible etc which is crucial for
> >> estimating index-only scans. I'd bet that's why the cost estimates for
> >> index scans and index-only scans are the same here.
> > 
> > You are right. When the table have rows and this is vacuumed, index only
> > scan's cost is cheaper and chosen properly. Sorry, I have jumped to the
> > conclusion before confirming this.
> > 
> 
> I'm very experienced in this. I've done this mistake a million times ;-)

Thank you. It is really encouraging for me.

> 
> regards
> 
> -- 
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Yugo Nagata <nagata@sraoss.co.jp>