Обсуждение: Index Onlys Scan for expressions

Поиск
Список
Период
Сортировка

Index Onlys Scan for expressions

От
Ildar Musin
Дата:
Hi, hackers!

There is a known issue that index only scan (IOS) can only work with
simple index keys based on single attributes and doesn't work with index
expressions. In this patch I propose a solution that adds support of IOS
for index expressions. Here's an example:

create table abc(a int, b int, c int);
create index on abc ((a * 1000 + b), c);

with t1 as (select generate_series(1, 1000) as x),
      t2 as (select generate_series(0, 999) as x)
insert into abc(a, b, c)
     select t1.x, t2.x, t2.x from t1, t2;
vacuum analyze;

Explain results with the patch:

explain (analyze, buffers) select a * 1000 + b + c from abc where a *
1000 + b = 1001;
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1
width=4) (actual time=0.032..0.033 rows=1 loops=1)
    Index Cond: ((((a * 1000) + b)) = 1001)
    Heap Fetches: 0
    Buffers: shared hit=4
  Planning time: 0.184 ms
  Execution time: 0.077 ms
(6 rows)

Before the patch it was:

explain (analyze, buffers) select a * 1000 + b + c from abc where a *
1000 + b = 1001;
                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
  Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1
width=4) (actual time=0.039..0.041 rows=1 loops=1)
    Index Cond: (((a * 1000) + b) = 1001)
    Buffers: shared hit=4
  Planning time: 0.177 ms
  Execution time: 0.088 ms
(5 rows)

This solution has limitations though: the restriction or the target
expression tree (or its part) must match exactly the index. E.g. this
expression will pass the check:

select a * 1000 + b + 100 from ...

but this will fail:

select 100 + a * 1000 + b from ...

because the parser groups it as:

(100 + a * 1000) + b

In this form it won't match any index key. Another case is when we
create index on (a+b) and then make query like 'select b+a ...' or '...
where b+a = smth' -- it won't match. This applies to regular index scan
too. Probably it worth to discuss the way to normalize index expressions
and clauses and work out more convenient way to match them.
Anyway, I will be grateful if you take a look at the patch in
attachment. Any comments and tips are welcome.

Thanks!

--
Ildar Musin
i.musin@postgrespro.ru


Вложения

Re: Index Onlys Scan for expressions

От
Tomas Vondra
Дата:
On 08/16/2016 12:03 AM, Ildar Musin wrote:
> Hi, hackers!
>
> There is a known issue that index only scan (IOS) can only work with
> simple index keys based on single attributes and doesn't work with index
> expressions. In this patch I propose a solution that adds support of IOS
> for index expressions. Here's an example:
>
> create table abc(a int, b int, c int);
> create index on abc ((a * 1000 + b), c);
>
> with t1 as (select generate_series(1, 1000) as x),
>      t2 as (select generate_series(0, 999) as x)
> insert into abc(a, b, c)
>     select t1.x, t2.x, t2.x from t1, t2;
> vacuum analyze;
>
> Explain results with the patch:
>
> explain (analyze, buffers) select a * 1000 + b + c from abc where a *
> 1000 + b = 1001;
>                                                        QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>
>  Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1
> width=4) (actual time=0.032..0.033 rows=1 loops=1)
>    Index Cond: ((((a * 1000) + b)) = 1001)
>    Heap Fetches: 0
>    Buffers: shared hit=4
>  Planning time: 0.184 ms
>  Execution time: 0.077 ms
> (6 rows)
>
> Before the patch it was:
>
> explain (analyze, buffers) select a * 1000 + b + c from abc where a *
> 1000 + b = 1001;
>                                                      QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------
>
>  Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1
> width=4) (actual time=0.039..0.041 rows=1 loops=1)
>    Index Cond: (((a * 1000) + b) = 1001)
>    Buffers: shared hit=4
>  Planning time: 0.177 ms
>  Execution time: 0.088 ms
> (5 rows)
>

Nice! I've only quickly skimmed through the diff, but it seems sane. 
Please add the patch to the 2016-09 CF, though.

> This solution has limitations though: the restriction or the target
> expression tree (or its part) must match exactly the index. E.g. this
> expression will pass the check:
>
> select a * 1000 + b + 100 from ...
>
> but this will fail:
>
> select 100 + a * 1000 + b from ...
>
> because the parser groups it as:
>
> (100 + a * 1000) + b
>
> In this form it won't match any index key. Another case is when we
> create index on (a+b) and then make query like 'select b+a ...' or '...
> where b+a = smth' -- it won't match. This applies to regular index scan
> too. Probably it worth to discuss the way to normalize index expressions
> and clauses and work out more convenient way to match them.
> Anyway, I will be grateful if you take a look at the patch in
> attachment. Any comments and tips are welcome.

I don't think it's a major limitation - it's quite similar to the 
limitation for partial indexes, i.e. with an index defined like

CREATE INDEX ON abc (c) WHERE a + b = 1000;

the index will not be used unless the query expression matches exactly. 
So for example this won't work:

SELECT c FROM abc WHERE b + a = 1000;

because the variables are in the opposite order. Moreover, in the target 
list it might be possible to use explicit parentheses to make it work, 
no? That is, will this work?

select 100 + (a * 1000 + b) from ...

Or will it still break the IOS?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Index Onlys Scan for expressions

От
Oleg Bartunov
Дата:
On Tue, Aug 16, 2016 at 1:03 AM, Ildar Musin <i.musin@postgrespro.ru> wrote:
> Hi, hackers!
>
> There is a known issue that index only scan (IOS) can only work with simple
> index keys based on single attributes and doesn't work with index
> expressions. In this patch I propose a solution that adds support of IOS for
> index expressions. Here's an example:
>
> create table abc(a int, b int, c int);
> create index on abc ((a * 1000 + b), c);
>
> with t1 as (select generate_series(1, 1000) as x),
>      t2 as (select generate_series(0, 999) as x)
> insert into abc(a, b, c)
>     select t1.x, t2.x, t2.x from t1, t2;
> vacuum analyze;
>
> Explain results with the patch:
>
> explain (analyze, buffers) select a * 1000 + b + c from abc where a * 1000 +
> b = 1001;
>                                                        QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>  Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1
> width=4) (actual time=0.032..0.033 rows=1 loops=1)
>    Index Cond: ((((a * 1000) + b)) = 1001)
>    Heap Fetches: 0
>    Buffers: shared hit=4
>  Planning time: 0.184 ms
>  Execution time: 0.077 ms
> (6 rows)
>
> Before the patch it was:
>
> explain (analyze, buffers) select a * 1000 + b + c from abc where a * 1000 +
> b = 1001;
>                                                      QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------
>  Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1 width=4)
> (actual time=0.039..0.041 rows=1 loops=1)
>    Index Cond: (((a * 1000) + b) = 1001)
>    Buffers: shared hit=4
>  Planning time: 0.177 ms
>  Execution time: 0.088 ms
> (5 rows)
>
> This solution has limitations though: the restriction or the target
> expression tree (or its part) must match exactly the index. E.g. this
> expression will pass the check:
>
> select a * 1000 + b + 100 from ...
>
> but this will fail:
>
> select 100 + a * 1000 + b from ...
>
> because the parser groups it as:
>
> (100 + a * 1000) + b
>
> In this form it won't match any index key. Another case is when we create
> index on (a+b) and then make query like 'select b+a ...' or '... where b+a =
> smth' -- it won't match. This applies to regular index scan too. Probably it
> worth to discuss the way to normalize index expressions and clauses and work
> out more convenient way to match them.

pg_operator.oprcommutative ?

> Anyway, I will be grateful if you take a look at the patch in attachment.
> Any comments and tips are welcome.
>
> Thanks!
>
> --
> Ildar Musin
> i.musin@postgrespro.ru
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



Re: Index Onlys Scan for expressions

От
Alexander Korotkov
Дата:
On Tue, Aug 16, 2016 at 9:09 AM, Oleg Bartunov <obartunov@gmail.com> wrote:
On Tue, Aug 16, 2016 at 1:03 AM, Ildar Musin <i.musin@postgrespro.ru> wrote:
> Hi, hackers!
>
> There is a known issue that index only scan (IOS) can only work with simple
> index keys based on single attributes and doesn't work with index
> expressions. In this patch I propose a solution that adds support of IOS for
> index expressions. Here's an example:
>
> create table abc(a int, b int, c int);
> create index on abc ((a * 1000 + b), c);
>
> with t1 as (select generate_series(1, 1000) as x),
>      t2 as (select generate_series(0, 999) as x)
> insert into abc(a, b, c)
>     select t1.x, t2.x, t2.x from t1, t2;
> vacuum analyze;
>
> Explain results with the patch:
>
> explain (analyze, buffers) select a * 1000 + b + c from abc where a * 1000 +
> b = 1001;
>                                                        QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------
>  Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1
> width=4) (actual time=0.032..0.033 rows=1 loops=1)
>    Index Cond: ((((a * 1000) + b)) = 1001)
>    Heap Fetches: 0
>    Buffers: shared hit=4
>  Planning time: 0.184 ms
>  Execution time: 0.077 ms
> (6 rows)
>
> Before the patch it was:
>
> explain (analyze, buffers) select a * 1000 + b + c from abc where a * 1000 +
> b = 1001;
>                                                      QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------
>  Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1 width=4)
> (actual time=0.039..0.041 rows=1 loops=1)
>    Index Cond: (((a * 1000) + b) = 1001)
>    Buffers: shared hit=4
>  Planning time: 0.177 ms
>  Execution time: 0.088 ms
> (5 rows)
>
> This solution has limitations though: the restriction or the target
> expression tree (or its part) must match exactly the index. E.g. this
> expression will pass the check:
>
> select a * 1000 + b + 100 from ...
>
> but this will fail:
>
> select 100 + a * 1000 + b from ...
>
> because the parser groups it as:
>
> (100 + a * 1000) + b
>
> In this form it won't match any index key. Another case is when we create
> index on (a+b) and then make query like 'select b+a ...' or '... where b+a =
> smth' -- it won't match. This applies to regular index scan too. Probably it
> worth to discuss the way to normalize index expressions and clauses and work
> out more convenient way to match them.

pg_operator.oprcommutative ?

Do you mean pg_operator.oprcom?
In these examples we're also lacking of pg_operator.oprassociative...
Another problem is computational complexity of such transformations.  AFAIR, few patches for making optimizer smarter with expressions were already rejected because of this reason.
Also, even if we would have such transformation of expressions, floating points types would be still problematic, because (a + b) + c = a + (b + c) is not exactly true for them because of computational error.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Index Onlys Scan for expressions

От
Robert Haas
Дата:
On Tue, Aug 16, 2016 at 2:43 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:g of pg_operator.oprassociative...
> Another problem is computational complexity of such transformations.  AFAIR,
> few patches for making optimizer smarter with expressions were already
> rejected because of this reason.

s/few/many/

> Also, even if we would have such transformation of expressions, floating
> points types would be still problematic, because (a + b) + c = a + (b + c)
> is not exactly true for them because of computational error.

Right.  I think matching expressions exactly is plenty good enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Index Onlys Scan for expressions

От
Vladimir Sitnikov
Дата:
Hi,

I've tried your indexonlypatch5.patch against REL9_6_BETA3.
Here are some results.

TL;DR:
1) <<where type=42 and upper(vc) like '%ABC%'>> does not support index-only scan for index (type, upper(vc) varchar_pattern_ops).
3) <<(... where type=42 offset 0) where upper_vc like '%ABC%'>> does trigger index-only scan. IOS reduces number of buffers from 977 to 17 and that is impressive.

Can IOS be used for simple query like #1 as well?

Here are the details.

drop table vlsi;
create table vlsi(type numeric, vc varchar(500));
insert into vlsi(type,vc) select round(x/1000), md5('||x)||md5('||x+1)||md5(''||x+2) from generate_series(1,1000000) as s(x);
create index type_vc__vlsi on vlsi(type, upper(vc) varchar_pattern_ops);
vacuum analyze vlsi;

0) Smoke test (index only scan works when selecting indexed expression):

explain (analyze, buffers) select type, upper(vc) from vlsi where type=42;

 Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..67.97 rows=971 width=36) (actual time=0.012..0.212 rows=1000 loops=1)
   Index Cond: (type = '42'::numeric)
   Heap Fetches: 0
   Buffers: shared hit=17
 Planning time: 0.112 ms
 Execution time: 0.272 ms

1) When trying to apply "like condition", index only scan does not work.
Note: "buffers hit" becomes 977 instead of 17.

explain (analyze, buffers) select type, upper(vc) from vlsi where type=42 and upper(vc) like '%ABC%';

 Index Scan using type_vc__vlsi on vlsi  (cost=0.55..1715.13 rows=20 width=36) (actual time=0.069..1.343 rows=23 loops=1)
   Index Cond: (type = '42'::numeric)
   Filter: (upper((vc)::text) ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=939
 Planning time: 0.104 ms
 Execution time: 1.358 ms

Mere "subquery" does not help: still no index-only scan

2) explain (analyze, buffers) select * from (select type, upper(vc) upper_vc from vlsi where type=42) as x where upper_vc like '%ABC%';

 Index Scan using type_vc__vlsi on vlsi  (cost=0.55..1715.13 rows=20 width=36) (actual time=0.068..1.344 rows=23 loops=1)
   Index Cond: (type = '42'::numeric)
   Filter: (upper((vc)::text) ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=939
 Planning time: 0.114 ms
 Execution time: 1.357 ms

3) "offset 0" trick does help:
explain (analyze, buffers) select * from (select type, upper(vc) upper_vc from vlsi where type=42 offset 0) as x where upper_vc like '%ABC%';

 Subquery Scan on x  (cost=0.55..80.11 rows=39 width=36) (actual time=0.033..0.488 rows=23 loops=1)
   Filter: (x.upper_vc ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=17
   ->  Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..67.97 rows=971 width=36) (actual time=0.015..0.210 rows=1000 loops=1)
         Index Cond: (type = '42'::numeric)
         Heap Fetches: 0
         Buffers: shared hit=17
 Planning time: 0.086 ms
 Execution time: 0.503 ms

Vladimir

Re: Index Onlys Scan for expressions

От
Ildar Musin
Дата:
Hi Vladimir,

On 23.08.2016 23:35, Vladimir Sitnikov wrote:
> Hi,
>
> I've tried your indexonlypatch5.patch against REL9_6_BETA3.
> Here are some results.
>
> TL;DR:
> 1) <<where type=42 and upper(vc) like '%ABC%'>> does not support
> index-only scan for index (type, upper(vc) varchar_pattern_ops).
> 3) <<(... where type=42 offset 0) where upper_vc like '%ABC%'>> does
> trigger index-only scan. IOS reduces number of buffers from 977 to 17
> and that is impressive.
>
> Can IOS be used for simple query like #1 as well?
>

Thanks for checking out the patch. Sorry for the delayed reply.

> Here are the details.
>
> drop table vlsi;
> create table vlsi(type numeric, vc varchar(500));
> insert into vlsi(type,vc) select round(x/1000),
> md5('||x)||md5('||x+1)||md5(''||x+2) from generate_series(1,1000000) as
> s(x);
> create index type_vc__vlsi on vlsi(type, upper(vc) varchar_pattern_ops);
> vacuum analyze vlsi;
>
> 0) Smoke test (index only scan works when selecting indexed expression):
>
> explain (analyze, buffers) select type, upper(vc) from vlsi where type=42;
>
>  Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..67.97 rows=971
> width=36) (actual time=0.012..0.212 rows=1000 loops=1)
>    Index Cond: (type = '42'::numeric)
>    Heap Fetches: 0
>    Buffers: shared hit=17
>  Planning time: 0.112 ms
>  Execution time: 0.272 ms
>
> 1) When trying to apply "like condition", index only scan does not work.
> Note: "buffers hit" becomes 977 instead of 17.
>
> explain (analyze, buffers) select type, upper(vc) from vlsi where
> type=42 and upper(vc) like '%ABC%';
>
>  Index Scan using type_vc__vlsi on vlsi  (cost=0.55..1715.13 rows=20
> width=36) (actual time=0.069..1.343 rows=23 loops=1)
>    Index Cond: (type = '42'::numeric)
>    Filter: (upper((vc)::text) ~~ '%ABC%'::text)
>    Rows Removed by Filter: 977
>    Buffers: shared hit=939
>  Planning time: 0.104 ms
>  Execution time: 1.358 ms
>

The reason why this doesn't work is that '~~' operator (which is a 
synonym for 'like') isn't supported by operator class for btree. Since 
the only operators supported by btree are <, <=, =, >=, >, you can use 
it with queries like:

explain (analyze, buffers) select type, upper(vc) from vlsi where 
type=42 and upper(vc) ~~ 'ABC%';                                                         QUERY PLAN 


-----------------------------------------------------------------------------------------------------------------------------
IndexOnly Scan using type_vc__vlsi on vlsi  (cost=0.55..4.58 rows=1 
 
width=36) (actual time=0.021..0.021 rows=0 loops=1)   Index Cond: ((type = '42'::numeric) AND ((upper((vc)::text)) ~>=~

'ABC'::text) AND ((upper((vc)::text)) ~<~ 'ABD'::text))   Filter: ((upper((vc)::text)) ~~ 'ABC%'::text)   Heap Fetches:
0  Buffers: shared hit=4 Planning time: 0.214 ms Execution time: 0.044 ms
 
(7 rows)

In case of fixed prefix postgres implicitly substitutes '~~' operator 
with two range operators:

((upper((vc)::text)) ~>=~ 'ABC'::text) AND ((upper((vc)::text)) ~<~ 
'ABD'::text)

so that you can use these conditions to lookup in btree.

> Mere "subquery" does not help: still no index-only scan
>
> 2) explain (analyze, buffers) select * from (select type, upper(vc)
> upper_vc from vlsi where type=42) as x where upper_vc like '%ABC%';
>
>  Index Scan using type_vc__vlsi on vlsi  (cost=0.55..1715.13 rows=20
> width=36) (actual time=0.068..1.344 rows=23 loops=1)
>    Index Cond: (type = '42'::numeric)
>    Filter: (upper((vc)::text) ~~ '%ABC%'::text)
>    Rows Removed by Filter: 977
>    Buffers: shared hit=939
>  Planning time: 0.114 ms
>  Execution time: 1.357 ms
>
> 3) "offset 0" trick does help:
> explain (analyze, buffers) select * from (select type, upper(vc)
> upper_vc from vlsi where type=42 offset 0) as x where upper_vc like '%ABC%';
>
>  Subquery Scan on x  (cost=0.55..80.11 rows=39 width=36) (actual
> time=0.033..0.488 rows=23 loops=1)
>    Filter: (x.upper_vc ~~ '%ABC%'::text)
>    Rows Removed by Filter: 977
>    Buffers: shared hit=17
>    ->  Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..67.97
> rows=971 width=36) (actual time=0.015..0.210 rows=1000 loops=1)
>          Index Cond: (type = '42'::numeric)
>          Heap Fetches: 0
>          Buffers: shared hit=17
>  Planning time: 0.086 ms
>  Execution time: 0.503 ms
>
> Vladimir

I debugged the last two queries to figure out the difference between 
them. It turned out that that the query 2) transforms to the same as 
query 1). And in 3rd query 'OFFSET' statement prevents rewriter from 
transforming the query, so it is possible to use index only scan on 
subquery and then filter the result of subquery with '~~' operator.

-- 
Ildar Musin
i.musin@postgrespro.ru



Re: Index Onlys Scan for expressions

От
Tomas Vondra
Дата:
Hi Ildar,

I've looked at this patch again today to do a bit more thorough review,
and I think it's fine. There are a few comments (particularly in the new
code in check_index_only) that need improving, and also a few small
tweaks in the walkers.

Attached is a modified v5 patch with the proposed changes - it's
probably easier than explaining what the changes should/might be.

I've added an XXX comment in check_index_only_expr_walker - ISTM we're
first explicitly matching  Vars to index attributes, and then dealing
with expressions. But we loop over all index columns (including those
that can only really match Vars). Not sure if it makes any difference,
but is it worth differentiating between Var and non-Var expressions? Why
not to simply call match_index_to_operand() in both cases?

I've also tweaked a few comments to match project code style, and moved
a few variables into the block where they are used. But the latter is
probably matter of personal taste, I guess.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Index Onlys Scan for expressions

От
Vladimir Sitnikov
Дата:
Ildar>The reason why this doesn't work is that '~~' operator (which is a
Ildar>synonym for 'like') isn't supported by operator class for btree. Since
Ildar>the only operators supported by btree are <, <=, =, >=, >, you can use
Ildar>it with queries like:

Ildar>And in 3rd query 'OFFSET' statement prevents rewriter from
Ildar>transforming the query, so it is possible to use index only scan on
Ildar>subquery and then filter the result of subquery with '~~' operator.

I'm afraid I do not follow you.
Note: query 3 is 100% equivalent of query 2, however query 3 takes 55 times less reads.
It looks like either an optimizer bug, or some missing feature in the "index only scan" logic.

Here's quote from "query 2" (note % are at both ends):  ... where type=42) as x where upper_vc like '%ABC%';

Note: I do NOT use "indexed scan" for the like operator. I'm very well aware
that LIKE patterns with leading % cannot be optimized to a btree range scan.
What I want is "use the first indexed column as index scan, then use the second column
for filtering".

As shown in "query 2" vs "query 3", PostgreSQL cannot come up with such a plan on its own
for some reason.

This is not a theoretical issue, but it is something that I use a lot with Oracle DB (it just creates a good plan for "query 2").

Vladimir

Re: Index Onlys Scan for expressions

От
Ildar Musin
Дата:
Hi Vladimir,

On 03.09.2016 19:31, Vladimir Sitnikov wrote:
> Ildar>The reason why this doesn't work is that '~~' operator (which is a
> Ildar>synonym for 'like') isn't supported by operator class for btree. Since
> Ildar>the only operators supported by btree are <, <=, =, >=, >, you can use
> Ildar>it with queries like:
>
> Ildar>And in 3rd query 'OFFSET' statement prevents rewriter from
> Ildar>transforming the query, so it is possible to use index only scan on
> Ildar>subquery and then filter the result of subquery with '~~' operator.
>
> I'm afraid I do not follow you.
> Note: query 3 is 100% equivalent of query 2, however query 3 takes 55
> times less reads.
> It looks like either an optimizer bug, or some missing feature in the
> "index only scan" logic.
>
> Here's quote from "query 2" (note % are at both ends):  ... where
> type=42) as x where upper_vc like '%ABC%';
>
> Note: I do NOT use "indexed scan" for the like operator. I'm very well aware
> that LIKE patterns with leading % cannot be optimized to a btree range scan.
> What I want is "use the first indexed column as index scan, then use the
> second column
> for filtering".
>
> As shown in "query 2" vs "query 3", PostgreSQL cannot come up with such
> a plan on its own
> for some reason.
>
> This is not a theoretical issue, but it is something that I use a lot
> with Oracle DB (it just creates a good plan for "query 2").
>
> Vladimir

Thanks, I get it now. The reason why it acts like this is that I used 
match_clause_to_index() function to determine if IOS can be used with 
the specified clauses. This function among other things checks if 
operator matches the index opfamily. Apparently this isn't correct. I 
wrote another prototype to test your case and it seems to work. But it's 
not ready for public yet, I'll publish it in 1-2 days.

-- 
Ildar Musin
i.musin@postgrespro.ru



Re: Index Onlys Scan for expressions

От
Ildar Musin
Дата:
Hi Tomas,

On 03.09.2016 14:37, Tomas Vondra wrote:
> Hi Ildar,
>
> I've looked at this patch again today to do a bit more thorough review,
> and I think it's fine. There are a few comments (particularly in the new
> code in check_index_only) that need improving, and also a few small
> tweaks in the walkers.
>
> Attached is a modified v5 patch with the proposed changes - it's
> probably easier than explaining what the changes should/might be.
>
> I've added an XXX comment in check_index_only_expr_walker - ISTM we're
> first explicitly matching  Vars to index attributes, and then dealing
> with expressions. But we loop over all index columns (including those
> that can only really match Vars). Not sure if it makes any difference,
> but is it worth differentiating between Var and non-Var expressions? Why
> not to simply call match_index_to_operand() in both cases?
>

Thank you! Yes, you're right and probably it doesn't worth it. I 
intended to optimize just a little bit since we already have attribute 
numbers that can be returned by index. But as you have already noted in 
comment it is a cheap check and it would barely be noticeable.

> I've also tweaked a few comments to match project code style, and moved
> a few variables into the block where they are used. But the latter is
> probably matter of personal taste, I guess.
>
>
> regards
>

Thanks for that, I have some difficulties in expressing myself in 
english, so it was very helpful.

Best regards,
-- 
Ildar Musin
i.musin@postgrespro.ru



Re: Index Onlys Scan for expressions

От
Ildar Musin
Дата:
Hi Vladimir,

On 05.09.2016 16:38, Ildar Musin wrote:
> Hi Vladimir,
>
> On 03.09.2016 19:31, Vladimir Sitnikov wrote:
>> Ildar>The reason why this doesn't work is that '~~' operator (which is a
>> Ildar>synonym for 'like') isn't supported by operator class for btree.
>> Since
>> Ildar>the only operators supported by btree are <, <=, =, >=, >, you
>> can use
>> Ildar>it with queries like:
>>
>> Ildar>And in 3rd query 'OFFSET' statement prevents rewriter from
>> Ildar>transforming the query, so it is possible to use index only scan on
>> Ildar>subquery and then filter the result of subquery with '~~' operator.
>>
>> I'm afraid I do not follow you.
>> Note: query 3 is 100% equivalent of query 2, however query 3 takes 55
>> times less reads.
>> It looks like either an optimizer bug, or some missing feature in the
>> "index only scan" logic.
>>
>> Here's quote from "query 2" (note % are at both ends):  ... where
>> type=42) as x where upper_vc like '%ABC%';
>>
>> Note: I do NOT use "indexed scan" for the like operator. I'm very well
>> aware
>> that LIKE patterns with leading % cannot be optimized to a btree range
>> scan.
>> What I want is "use the first indexed column as index scan, then use the
>> second column
>> for filtering".
>>
>> As shown in "query 2" vs "query 3", PostgreSQL cannot come up with such
>> a plan on its own
>> for some reason.
>>
>> This is not a theoretical issue, but it is something that I use a lot
>> with Oracle DB (it just creates a good plan for "query 2").
>>
>> Vladimir
>
> Thanks, I get it now. The reason why it acts like this is that I used
> match_clause_to_index() function to determine if IOS can be used with
> the specified clauses. This function among other things checks if
> operator matches the index opfamily. Apparently this isn't correct. I
> wrote another prototype to test your case and it seems to work. But it's
> not ready for public yet, I'll publish it in 1-2 days.
>

Here is a new patch version. I modified check_index_only_clauses() so
that it doesn't use match_clause_to_indexcol() anymore. Instead it
handles different types of expressions including binary operator
expressions, scalar array expressions, row compare expressions (e.g.
(a,b)<(1,2)) and null tests and tries to match each part of expression
to index regardless an operator. I reproduced your example and was able
to get index only scan on all queries. Could you please try the patch
and tell if it works for you?

--
Ildar Musin
i.musin@postgrespro.ru

Вложения

Re: Index Onlys Scan for expressions

От
Vladimir Sitnikov
Дата:
Ildar> Could you please try the patch and tell if it works for you?

I've tested patch6 against recent head. The patch applies with no problems.

The previous case (filter on top of i-o-s) is fixed. Great work.


However, it looks there are issues when accessing non-indexed columns.
The error is "ERROR: variable not found in subplan target list"
The case is 02_case2_fails.sql (see the gist link above)

The essence of the case is "create index on substr(vc, 1, 128)"
and assume that majority of the rows have length(vc)<128.
Under that conditions, it would be nice to do index-only-scan
and filter (like in my previous case), but detect "long" rows
and do additional recheck for them.

Vladimir

Re: Index Onlys Scan for expressions

От
Robert Haas
Дата:
On Thu, Sep 8, 2016 at 2:58 PM, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
> Ildar> Could you please try the patch and tell if it works for you?
>
> I've tested patch6 against recent head. The patch applies with no problems.
>
> The previous case (filter on top of i-o-s) is fixed. Great work.
>
> Here are the test cases and results:
> https://gist.github.com/vlsi/008e18e18b609fcaaec53d9cc210b7e2
>
> However, it looks there are issues when accessing non-indexed columns.
> The error is "ERROR: variable not found in subplan target list"
> The case is 02_case2_fails.sql (see the gist link above)
>
> The essence of the case is "create index on substr(vc, 1, 128)"
> and assume that majority of the rows have length(vc)<128.
> Under that conditions, it would be nice to do index-only-scan
> and filter (like in my previous case), but detect "long" rows
> and do additional recheck for them.

Based on this report, this patch appears to have bugs that would
preclude committing it, so I'm marking it "Returned with Feedback" for
this CommitFest, which is due to end shortly.  Ildar, please feel free
to resubmit once you've updated the patch.

FWIW, I think this is a good effort and hope to see it move forward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company