Обсуждение: Use of additional index columns in rows filtering

Поиск
Список
Период
Сортировка

Use of additional index columns in rows filtering

От
Maxim Ivanov
Дата:
Hi All,

I'd like to report what seems to be a missing optimization opportunity or understand why it is not possible to achieve.

TLDR; additional index column B specified in CREATE INDEX .. (A) INCLUDE(B) is not used to filter rows in queries like
WHEREB = $1 ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L 


Take following database:


  CREATE TABLE t(
    a integer NOT NULL,
    b integer NOT NULL,
    d integer NOT NULL
  );

  CREATE UNIQUE INDEX t_a_include_b ON t (a) INCLUDE (b);
  -- I'd expect index above to behave as index below for the purpose
  -- of this query
  -- CREATE UNIQUE INDEX ON t(a,b);

  INSERT INTO t(
    SELECT random() * 100000000 as a,
           random() * 3 as b,
           generate_series as d FROM generate_series(1,200000)
  ) ON CONFLICT DO NOTHING;


If we filter on `a` and `b` columns while scanning index created as `(a) INCLUDE (b)` it seems to be fetching tuples
fromheap to check for condition `b = 4` despite both columns available in the index: 

    SELECT * FROM  t WHERE a > 1000000 and b = 4 ORDER BY a ASC LIMIT 10;


Here is the plan (notice high "shared hit"):

  Limit  (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 rows=0 loops=1)
    Output: a, b, d
    Buffers: shared hit=198307
    ->  Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 width=12) (actual time=84.280..84.281
rows=0loops=1) 
          Output: a, b, d
          Index Cond: (t.a > 1000000)
          Filter: (t.b = 4)
          Rows Removed by Filter: 197805
          Buffers: shared hit=198307
  Planning:
    Buffers: shared hit=30
  Planning Time: 0.201 ms
  Execution Time: 84.303 ms


And here is the plan with index on (a,b).

  Limit  (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 rows=0 loops=1)
    Output: a, b, d
    Buffers: shared hit=613
    ->  Index Scan using t_a_b_idx on public.t  (cost=0.42..4447.90 rows=1 width=12) (actual time=6.880..6.881 rows=0
loops=1)
          Output: a, b, d
          Index Cond: ((t.a > 1000000) AND (t.b = 4))
          Buffers: shared hit=613
  Planning:
    Buffers: shared hit=41
  Planning Time: 0.314 ms
  Execution Time: 6.910 ms


Because query doesn't sort on `b`, only filters on it while sorting on `a`, I'd expect indexes `(a) INCLUDE (b)` and
`(a,b)`behave exactly the same with this particular query. 

Interestingly, IndexOnlyScan is capable of using additional columns to filter rows without fetching them from the heap,
butonly for visibible tuples: 

  VACUUM FREEZE t;
  SELECT a,b FROM  t WHERE a > 1000000 and b = 4 ORDER BY a ASC LIMIT 10;

  Limit  (cost=0.42..6619.76 rows=1 width=8) (actual time=18.479..18.480 rows=0 loops=1)
    Output: a, b
    Buffers: shared hit=662
    ->  Index Only Scan using t_a_include_b on public.t  (cost=0.42..6619.76 rows=1 width=8) (actual
time=18.477..18.477rows=0 loops=1) 
          Output: a, b
          Index Cond: (t.a > 1000000)
          Filter: (t.b = 4)
          Rows Removed by Filter: 197771
          Heap Fetches: 0
          Buffers: shared hit=662

Removing VACUUM makes it behave like IndexScan and fetch candidate tuples from heap all while returning zero rows in
theresult. 


To make query plan comparable I had to force index scan on both with:

  SET enable_bitmapscan to off;
  SET enable_seqscan to off;
  SET max_parallel_workers_per_gather = 0;

Self contained fully reproducible example is in https://dbfiddle.uk/iehtq44L

Regards,
Maxim



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 2/15/23 09:57, Maxim Ivanov wrote:
> Hi All,
> 
> I'd like to report what seems to be a missing optimization 
> opportunity or understand why it is not possible to achieve.
> 
> TLDR; additional index column B specified in CREATE INDEX .. (A) 
> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1
> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L
> 
> ...
> 
> Here is the plan (notice high "shared hit"):
> 
>   Limit  (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 rows=0 loops=1)
>     Output: a, b, d
>     Buffers: shared hit=198307
>     ->  Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 width=12) (actual time=84.280..84.281
rows=0loops=1)
 
>           Output: a, b, d
>           Index Cond: (t.a > 1000000)
>           Filter: (t.b = 4)
>           Rows Removed by Filter: 197805
>           Buffers: shared hit=198307
>   Planning:
>     Buffers: shared hit=30
>   Planning Time: 0.201 ms
>   Execution Time: 84.303 ms
> 

Yeah. The reason for this behavior is pretty simple:

1) When matching clauses to indexes in match_clause_to_index(), we only
   look at key columns (nkeycolumns). We'd need to check all columns
   (ncolumns) and remember if the clause matched a key or included one.

2) index_getnext_slot would need to get "candidate" TIDs using
   conditions on keys, and then check the clauses on included
   columns.

Seems doable, unless I'm missing some fatal issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> On 2/15/23 09:57, Maxim Ivanov wrote:
>> TLDR; additional index column B specified in CREATE INDEX .. (A) 
>> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1
>> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L

> Seems doable, unless I'm missing some fatal issue.

Partly this is lack of round tuits, but there's another significant
issue: there very likely are index entries corresponding to dead heap
rows.  Applying random user-defined quals to values found in such rows
could produce semantic anomalies; for example, divide-by-zero failures
even though you deleted all the rows having a zero in that column.

This isn't a problem for operators found in operator families, because
we trust those to not have undesirable side effects like raising
data-dependent errors.  But it'd be an issue if we started to apply
quals that aren't index quals directly to index rows before doing
the heap liveness check.  (And, of course, once you've fetched the
heap row there's no point in having a special path for columns
available from the index.)

            regards, tom lane



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 2/15/23 16:18, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> On 2/15/23 09:57, Maxim Ivanov wrote:
>>> TLDR; additional index column B specified in CREATE INDEX .. (A) 
>>> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1
>>> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L
> 
>> Seems doable, unless I'm missing some fatal issue.
> 
> Partly this is lack of round tuits, but there's another significant
> issue: there very likely are index entries corresponding to dead heap
> rows.  Applying random user-defined quals to values found in such rows
> could produce semantic anomalies; for example, divide-by-zero failures
> even though you deleted all the rows having a zero in that column.
> 
> This isn't a problem for operators found in operator families, because
> we trust those to not have undesirable side effects like raising
> data-dependent errors.  But it'd be an issue if we started to apply
> quals that aren't index quals directly to index rows before doing
> the heap liveness check.  (And, of course, once you've fetched the
> heap row there's no point in having a special path for columns
> available from the index.)

Sure, but we can do the same VM check as index-only scan, right?

That would save some of the I/O to fetch the heap tuple, as long as the
page is all-visible and the filter eliminates the tuples. It makes the
costing a bit trickier, because it needs to consider both how many pages
are all-visible and selectivity of the condition.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Maxim Ivanov
Дата:
> This isn't a problem for operators found in operator families, because
> we trust those to not have undesirable side effects like raising
> data-dependent errors. But it'd be an issue if we started to apply
> quals that aren't index quals directly to index rows before doing
> the heap liveness check. (And, of course, once you've fetched the
> heap row there's no point in having a special path for columns
> available from the index.)

Assuming operators are pure and don't have global side effects, is it possible to ignore any error during that check?
Iftuple is not visible it shouldn't matter, if it is visible then error will be reported by the same routine which does
filteringnow (ExecQual?). 


If not, then limiting this optimization to builtin ops is something I can live with :)




Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
Hi,

I took a stab at this and implemented the trick with the VM - during
index scan, we also extract the filters that only need the indexed
attributes (just like in IOS). And then, during the execution we:

  1) scan the index using the scan keys (as before)

  2) if the heap page is all-visible, we check the new filters that can
     be evaluated on the index tuple

  3) fetch the heap tuple and evaluate the filters

This is pretty much exactly the same thing we do for IOS, so I don't see
why this would be incorrect while IOS is correct.

This also adds "Index Filter" to explain output, to show which filters
are executed on the index tuple (at the moment the filters are a subset
of "Filter"), so if the index tuple matches we'll execute them again on
the heap tuple. I guess that could be fixed by having two "filter"
lists, depending on whether we were able to evaluate the index filters.

Most of the patch is pretty mechanical - particularly the planning part
is about identifying filters that can be evaluated on the index tuple,
and that code was mostly shamelessly copied from index-only scan.

The matching of filters to index is done in check_index_filter(), and
it's simpler than match_clause_to_indexcol() as it does not need to
consider operators etc. (I think). But maybe it should be careful about
other things, not sure.

The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned
earlier, the idea is to check VM and evaluate the filters on the index
tuple if possible, similar to index-only scans. Except that we then have
to fetch the heap tuple. Unfortunately, this means the code can't use
index_getnext_slot() anymore. Perhaps we should invent a new variant
that'd allow evaluating the index filters in between.


With the patch applied, the query plan changes from:

                                 QUERY PLAN
    -------------------------------------------------------------------
     Limit  (cost=0.42..10929.89 rows=1 width=12)
            (actual time=94.649..94.653 rows=0 loops=1)
       Buffers: shared hit=197575 read=661
       ->  Index Scan using t_a_include_b on t
          (cost=0.42..10929.89 rows=1 width=12)
          (actual time=94.646..94.647 rows=0 loops=1)
             Index Cond: (a > 1000000)
             Filter: (b = 4)
             Rows Removed by Filter: 197780
             Buffers: shared hit=197575 read=661
     Planning Time: 0.091 ms
     Execution Time: 94.674 ms
    (9 rows)

to

                                 QUERY PLAN
    -------------------------------------------------------------------
     Limit  (cost=0.42..3662.15 rows=1 width=12)
            (actual time=13.663..13.667 rows=0 loops=1)
       Buffers: shared hit=544
       ->  Index Scan using t_a_include_b on t
           (cost=0.42..3662.15 rows=1 width=12)
           (actual time=13.659..13.660 rows=0 loops=1)
             Index Cond: (a > 1000000)
             Index Filter: (b = 4)
             Rows Removed by Index Recheck: 197780
             Filter: (b = 4)
             Buffers: shared hit=544
     Planning Time: 0.105 ms
     Execution Time: 13.690 ms
    (10 rows)

which is much closer to the "best" case:

                                 QUERY PLAN
    -------------------------------------------------------------------
     Limit  (cost=0.42..4155.90 rows=1 width=12)
            (actual time=10.152..10.156 rows=0 loops=1)
       Buffers: shared read=543
       ->  Index Scan using t_a_b_idx on t
           (cost=0.42..4155.90 rows=1 width=12)
           (actual time=10.148..10.150 rows=0 loops=1)
             Index Cond: ((a > 1000000) AND (b = 4))
             Buffers: shared read=543
     Planning Time: 0.089 ms
     Execution Time: 10.176 ms
    (7 rows)


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: Use of additional index columns in rows filtering

От
James Coleman
Дата:
Hello,

I've cc'd Jeff Davis on this due to a conversation we had at PGCon
about applying filters on index tuples during index scans.

I've also cc'd Andres Freund because I think this relates to his
musing in [1] that:
> One thing I have been wondering around this is whether we should not have
> split the code for IOS and plain indexscans...

I think I remember Peter Geoghegan also wondering (I can't remember if
this was in conversation at PGCon about index skip scans or in a
hackers thread) about how we compose these various index scan
optimizations.

To be certain this is probably a thing to tackle as a follow-on to
this patch, but it does seem to me that what we are implicitly
realizing is that (unlike with bitmap scans, I think) it doesn't
really make a lot of conceptual sense to have index only scans be a
separate node from index scans. Instead it's likely better to consider
it an optimization to index scans that can dynamically kick in when
it's able to be of use. That would allow it to compose with e.g.
prefetching in the aforelinked thread. At the very least we would need
pragmatic (e.g., cost of dynamically applying optimizations) rather
than conceptual reasons to argue they should continue to be separate.

Apologies for that lengthy preamble; on to the patch under discussion:

On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> I took a stab at this and implemented the trick with the VM - during
> index scan, we also extract the filters that only need the indexed
> attributes (just like in IOS). And then, during the execution we:
>
>   1) scan the index using the scan keys (as before)
>
>   2) if the heap page is all-visible, we check the new filters that can
>      be evaluated on the index tuple
>
>   3) fetch the heap tuple and evaluate the filters

Thanks for working on this; I'm excited about this class of work
(along with index prefetching and other ideas I think there's a lot of
potential for improving index scans).

> This is pretty much exactly the same thing we do for IOS, so I don't see
> why this would be incorrect while IOS is correct.
>
> This also adds "Index Filter" to explain output, to show which filters
> are executed on the index tuple (at the moment the filters are a subset
> of "Filter"), so if the index tuple matches we'll execute them again on
> the heap tuple. I guess that could be fixed by having two "filter"
> lists, depending on whether we were able to evaluate the index filters.

Given that we show index filters and heap filters separately it seems
like we might want to maintain separate instrumentation counts of how
many tuple were filtered by each set of filters.

> Most of the patch is pretty mechanical - particularly the planning part
> is about identifying filters that can be evaluated on the index tuple,
> and that code was mostly shamelessly copied from index-only scan.
>
> The matching of filters to index is done in check_index_filter(), and
> it's simpler than match_clause_to_indexcol() as it does not need to
> consider operators etc. (I think). But maybe it should be careful about
> other things, not sure.

This would end up requiring some refactoring of the existing index
matching code (or alternative caching on IndexOptInfo), but
match_filter_to_index() calling check_index_filter() results in
constructs a bitmapset of index columns for every possible filter
which seems wasteful (I recognize this is a bit of a proof-of-concept
level v1).

> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned
> earlier, the idea is to check VM and evaluate the filters on the index
> tuple if possible, similar to index-only scans. Except that we then have
> to fetch the heap tuple. Unfortunately, this means the code can't use
> index_getnext_slot() anymore. Perhaps we should invent a new variant
> that'd allow evaluating the index filters in between.

It does seem there are some refactoring opportunities there.

> With the patch applied, the query plan changes from:
>
> ...
>
> to
>
>                                  QUERY PLAN
>     -------------------------------------------------------------------
>      Limit  (cost=0.42..3662.15 rows=1 width=12)
>             (actual time=13.663..13.667 rows=0 loops=1)
>        Buffers: shared hit=544
>        ->  Index Scan using t_a_include_b on t
>            (cost=0.42..3662.15 rows=1 width=12)
>            (actual time=13.659..13.660 rows=0 loops=1)
>              Index Cond: (a > 1000000)
>              Index Filter: (b = 4)
>              Rows Removed by Index Recheck: 197780
>              Filter: (b = 4)
>              Buffers: shared hit=544
>      Planning Time: 0.105 ms
>      Execution Time: 13.690 ms
>     (10 rows)
>
> ...

I did also confirm that this properly identifies cases Jeff had
mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10)
= 4))".

I noticed also you still had questions/TODOs about handling index
scans for join clauses.

Regards,
James Coleman

1: https://www.postgresql.org/message-id/20230609000600.syqy447e6metnvyj%40awork3.anarazel.de



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 6/21/23 14:45, James Coleman wrote:
> Hello,
> 
> I've cc'd Jeff Davis on this due to a conversation we had at PGCon
> about applying filters on index tuples during index scans.
> 
> I've also cc'd Andres Freund because I think this relates to his
> musing in [1] that:
>> One thing I have been wondering around this is whether we should not have
>> split the code for IOS and plain indexscans...
> 
> I think I remember Peter Geoghegan also wondering (I can't remember if
> this was in conversation at PGCon about index skip scans or in a
> hackers thread) about how we compose these various index scan
> optimizations.
> 
> To be certain this is probably a thing to tackle as a follow-on to
> this patch, but it does seem to me that what we are implicitly
> realizing is that (unlike with bitmap scans, I think) it doesn't
> really make a lot of conceptual sense to have index only scans be a
> separate node from index scans. Instead it's likely better to consider
> it an optimization to index scans that can dynamically kick in when
> it's able to be of use. That would allow it to compose with e.g.
> prefetching in the aforelinked thread. At the very least we would need
> pragmatic (e.g., cost of dynamically applying optimizations) rather
> than conceptual reasons to argue they should continue to be separate.
> 

I agree it seems a bit weird to have IOS as a separate node. In a way, I
think there are two dimensions for "index-only" scans - which pages can
be scanned like that, and which clauses can be evaluated with only the
index tuple. The current approach focuses on page visibility, but
ignores the other aspect entirely. Or more precisely, it disables IOS
entirely as soon as there's a single condition requiring heap tuple.

I agree it's probably better to see this as a single node with various
optimizations that can be applied when possible / efficient (based on
planner and/or dynamically).

I'm not sure I see a direct link to the prefetching patch, but it's true
that needs to deal with tids (instead of slots), just like IOS. So if
the node worked with tids, maybe the prefetching could be done at that
level (which I now realize may be what Andres meant by doing prefetching
in the executor).

> Apologies for that lengthy preamble; on to the patch under discussion:
> 
> On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> Hi,
>>
>> I took a stab at this and implemented the trick with the VM - during
>> index scan, we also extract the filters that only need the indexed
>> attributes (just like in IOS). And then, during the execution we:
>>
>>   1) scan the index using the scan keys (as before)
>>
>>   2) if the heap page is all-visible, we check the new filters that can
>>      be evaluated on the index tuple
>>
>>   3) fetch the heap tuple and evaluate the filters
> 
> Thanks for working on this; I'm excited about this class of work
> (along with index prefetching and other ideas I think there's a lot of
> potential for improving index scans).
> 
>> This is pretty much exactly the same thing we do for IOS, so I don't see
>> why this would be incorrect while IOS is correct.
>>
>> This also adds "Index Filter" to explain output, to show which filters
>> are executed on the index tuple (at the moment the filters are a subset
>> of "Filter"), so if the index tuple matches we'll execute them again on
>> the heap tuple. I guess that could be fixed by having two "filter"
>> lists, depending on whether we were able to evaluate the index filters.
> 
> Given that we show index filters and heap filters separately it seems
> like we might want to maintain separate instrumentation counts of how
> many tuple were filtered by each set of filters.
> 

Yeah, separate instrumentation counters would be useful. What I was
talking about was more about the conditions itself, because right now we
re-evaluate the index-only clauses on the heap tuple.

Imagine an index on t(a) and a query that has WHERE (a = 1) AND (b = 2).
the patch splits this into two lists:

index-only clauses: (a=1)
clauses: (a=1) AND (b=1)

So we evaluate (a=1) first, and then we fetch the heap tuple and check
"clauses" again, which however includes the (a=1) again. For cheap
clauses (or when (a=1) eliminated a lot of tuples using just the index),
but for expensive clauses it might hurt.

It's fixable, we'd just need to keep two versions of the "clauses" list,
one for IOS mode (when index-only clauses were checked) and a complete
one when we need to check all clauses.

>> Most of the patch is pretty mechanical - particularly the planning part
>> is about identifying filters that can be evaluated on the index tuple,
>> and that code was mostly shamelessly copied from index-only scan.
>>
>> The matching of filters to index is done in check_index_filter(), and
>> it's simpler than match_clause_to_indexcol() as it does not need to
>> consider operators etc. (I think). But maybe it should be careful about
>> other things, not sure.
> 
> This would end up requiring some refactoring of the existing index
> matching code (or alternative caching on IndexOptInfo), but
> match_filter_to_index() calling check_index_filter() results in
> constructs a bitmapset of index columns for every possible filter
> which seems wasteful (I recognize this is a bit of a proof-of-concept
> level v1).
> 

Probably, I'm sure there's a couple other places where the current API
was a bit cumbersome and we could optimize.

>> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned
>> earlier, the idea is to check VM and evaluate the filters on the index
>> tuple if possible, similar to index-only scans. Except that we then have
>> to fetch the heap tuple. Unfortunately, this means the code can't use
>> index_getnext_slot() anymore. Perhaps we should invent a new variant
>> that'd allow evaluating the index filters in between.
> 
> It does seem there are some refactoring opportunities there.
> 

Actually, I realized maybe we should switch this to index_getnext_tid()
because of the prefetching patch. That would allow us to introduce a
"buffer" of TIDs, populated by the index_getnext_tid(), and then do
prefetching based on that. It's similar to what bitmap scans do, except
that intead of the tbm iterator we get items from index_getnext_tid().

I haven't tried implementing this yet, but I kinda like the idea as it
works no matter what exactly the AM does (i.e. it'd work even for cases
like GiST with distance searches).


>> With the patch applied, the query plan changes from:
>>
>> ...
>>
>> to
>>
>>                                  QUERY PLAN
>>     -------------------------------------------------------------------
>>      Limit  (cost=0.42..3662.15 rows=1 width=12)
>>             (actual time=13.663..13.667 rows=0 loops=1)
>>        Buffers: shared hit=544
>>        ->  Index Scan using t_a_include_b on t
>>            (cost=0.42..3662.15 rows=1 width=12)
>>            (actual time=13.659..13.660 rows=0 loops=1)
>>              Index Cond: (a > 1000000)
>>              Index Filter: (b = 4)
>>              Rows Removed by Index Recheck: 197780
>>              Filter: (b = 4)
>>              Buffers: shared hit=544
>>      Planning Time: 0.105 ms
>>      Execution Time: 13.690 ms
>>     (10 rows)
>>
>> ...
> 
> I did also confirm that this properly identifies cases Jeff had
> mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10)
> = 4))".
> 

Good!

> I noticed also you still had questions/TODOs about handling index
> scans for join clauses.
> 

Not sure which questions/TODOs you refer to, but I don't recall any
issues with join clauses. But maybe I just forgot.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
James Coleman
Дата:
On Wed, Jun 21, 2023 at 11:28 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 6/21/23 14:45, James Coleman wrote:
> > Hello,
> >
> > I've cc'd Jeff Davis on this due to a conversation we had at PGCon
> > about applying filters on index tuples during index scans.
> >
> > I've also cc'd Andres Freund because I think this relates to his
> > musing in [1] that:
> >> One thing I have been wondering around this is whether we should not have
> >> split the code for IOS and plain indexscans...
> >
> > I think I remember Peter Geoghegan also wondering (I can't remember if
> > this was in conversation at PGCon about index skip scans or in a
> > hackers thread) about how we compose these various index scan
> > optimizations.
> >
> > To be certain this is probably a thing to tackle as a follow-on to
> > this patch, but it does seem to me that what we are implicitly
> > realizing is that (unlike with bitmap scans, I think) it doesn't
> > really make a lot of conceptual sense to have index only scans be a
> > separate node from index scans. Instead it's likely better to consider
> > it an optimization to index scans that can dynamically kick in when
> > it's able to be of use. That would allow it to compose with e.g.
> > prefetching in the aforelinked thread. At the very least we would need
> > pragmatic (e.g., cost of dynamically applying optimizations) rather
> > than conceptual reasons to argue they should continue to be separate.
> >
>
> I agree it seems a bit weird to have IOS as a separate node. In a way, I
> think there are two dimensions for "index-only" scans - which pages can
> be scanned like that, and which clauses can be evaluated with only the
> index tuple. The current approach focuses on page visibility, but
> ignores the other aspect entirely. Or more precisely, it disables IOS
> entirely as soon as there's a single condition requiring heap tuple.
>
> I agree it's probably better to see this as a single node with various
> optimizations that can be applied when possible / efficient (based on
> planner and/or dynamically).
>
> I'm not sure I see a direct link to the prefetching patch, but it's true
> that needs to deal with tids (instead of slots), just like IOS. So if
> the node worked with tids, maybe the prefetching could be done at that
> level (which I now realize may be what Andres meant by doing prefetching
> in the executor).

The link to prefetching is that IOS (as a separate node) won't benefit
from prefetching (I think) with your current prefetching patch (in the
case where the VM doesn't allow us to just use the index tuple),
whereas if the nodes were combined that would more naturally be
composable.

> > Apologies for that lengthy preamble; on to the patch under discussion:
> >
> > On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >> Hi,
> >>
> >> I took a stab at this and implemented the trick with the VM - during
> >> index scan, we also extract the filters that only need the indexed
> >> attributes (just like in IOS). And then, during the execution we:
> >>
> >>   1) scan the index using the scan keys (as before)
> >>
> >>   2) if the heap page is all-visible, we check the new filters that can
> >>      be evaluated on the index tuple
> >>
> >>   3) fetch the heap tuple and evaluate the filters
> >
> > Thanks for working on this; I'm excited about this class of work
> > (along with index prefetching and other ideas I think there's a lot of
> > potential for improving index scans).
> >
> >> This is pretty much exactly the same thing we do for IOS, so I don't see
> >> why this would be incorrect while IOS is correct.
> >>
> >> This also adds "Index Filter" to explain output, to show which filters
> >> are executed on the index tuple (at the moment the filters are a subset
> >> of "Filter"), so if the index tuple matches we'll execute them again on
> >> the heap tuple. I guess that could be fixed by having two "filter"
> >> lists, depending on whether we were able to evaluate the index filters.
> >
> > Given that we show index filters and heap filters separately it seems
> > like we might want to maintain separate instrumentation counts of how
> > many tuple were filtered by each set of filters.
> >
>
> Yeah, separate instrumentation counters would be useful. What I was
> talking about was more about the conditions itself, because right now we
> re-evaluate the index-only clauses on the heap tuple.
>
> Imagine an index on t(a) and a query that has WHERE (a = 1) AND (b = 2).
> the patch splits this into two lists:
>
> index-only clauses: (a=1)
> clauses: (a=1) AND (b=1)
>
> So we evaluate (a=1) first, and then we fetch the heap tuple and check
> "clauses" again, which however includes the (a=1) again. For cheap
> clauses (or when (a=1) eliminated a lot of tuples using just the index),
> but for expensive clauses it might hurt.
>
> It's fixable, we'd just need to keep two versions of the "clauses" list,
> one for IOS mode (when index-only clauses were checked) and a complete
> one when we need to check all clauses.

In some cases (where the VM doesn't allow us to use just the index
tuple) we'd have to execute both lists against the heap tuple, right?

> >> Most of the patch is pretty mechanical - particularly the planning part
> >> is about identifying filters that can be evaluated on the index tuple,
> >> and that code was mostly shamelessly copied from index-only scan.
> >>
> >> The matching of filters to index is done in check_index_filter(), and
> >> it's simpler than match_clause_to_indexcol() as it does not need to
> >> consider operators etc. (I think). But maybe it should be careful about
> >> other things, not sure.
> >
> > This would end up requiring some refactoring of the existing index
> > matching code (or alternative caching on IndexOptInfo), but
> > match_filter_to_index() calling check_index_filter() results in
> > constructs a bitmapset of index columns for every possible filter
> > which seems wasteful (I recognize this is a bit of a proof-of-concept
> > level v1).
> >
>
> Probably, I'm sure there's a couple other places where the current API
> was a bit cumbersome and we could optimize.
>
> >> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned
> >> earlier, the idea is to check VM and evaluate the filters on the index
> >> tuple if possible, similar to index-only scans. Except that we then have
> >> to fetch the heap tuple. Unfortunately, this means the code can't use
> >> index_getnext_slot() anymore. Perhaps we should invent a new variant
> >> that'd allow evaluating the index filters in between.
> >
> > It does seem there are some refactoring opportunities there.
> >
>
> Actually, I realized maybe we should switch this to index_getnext_tid()
> because of the prefetching patch. That would allow us to introduce a
> "buffer" of TIDs, populated by the index_getnext_tid(), and then do
> prefetching based on that. It's similar to what bitmap scans do, except
> that intead of the tbm iterator we get items from index_getnext_tid().
>
> I haven't tried implementing this yet, but I kinda like the idea as it
> works no matter what exactly the AM does (i.e. it'd work even for cases
> like GiST with distance searches).

Oh, interesting, I'll let you keep chewing on that then.

> >> With the patch applied, the query plan changes from:
> >>
> >> ...
> >>
> >> to
> >>
> >>                                  QUERY PLAN
> >>     -------------------------------------------------------------------
> >>      Limit  (cost=0.42..3662.15 rows=1 width=12)
> >>             (actual time=13.663..13.667 rows=0 loops=1)
> >>        Buffers: shared hit=544
> >>        ->  Index Scan using t_a_include_b on t
> >>            (cost=0.42..3662.15 rows=1 width=12)
> >>            (actual time=13.659..13.660 rows=0 loops=1)
> >>              Index Cond: (a > 1000000)
> >>              Index Filter: (b = 4)
> >>              Rows Removed by Index Recheck: 197780
> >>              Filter: (b = 4)
> >>              Buffers: shared hit=544
> >>      Planning Time: 0.105 ms
> >>      Execution Time: 13.690 ms
> >>     (10 rows)
> >>
> >> ...
> >
> > I did also confirm that this properly identifies cases Jeff had
> > mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10)
> > = 4))".
> >
>
> Good!
>
> > I noticed also you still had questions/TODOs about handling index
> > scans for join clauses.
> >
>
> Not sure which questions/TODOs you refer to, but I don't recall any
> issues with join clauses. But maybe I just forgot.

I was referring to the comment:

+ * FIXME Maybe this should fill the filterset too?

above match_eclass_clauses_to_index()'s definition.

Regards,
James



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 6/21/23 18:17, James Coleman wrote:
> On Wed, Jun 21, 2023 at 11:28 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>>
>>
>> On 6/21/23 14:45, James Coleman wrote:
>>> Hello,
>>>
>>> I've cc'd Jeff Davis on this due to a conversation we had at PGCon
>>> about applying filters on index tuples during index scans.
>>>
>>> I've also cc'd Andres Freund because I think this relates to his
>>> musing in [1] that:
>>>> One thing I have been wondering around this is whether we should not have
>>>> split the code for IOS and plain indexscans...
>>>
>>> I think I remember Peter Geoghegan also wondering (I can't remember if
>>> this was in conversation at PGCon about index skip scans or in a
>>> hackers thread) about how we compose these various index scan
>>> optimizations.
>>>
>>> To be certain this is probably a thing to tackle as a follow-on to
>>> this patch, but it does seem to me that what we are implicitly
>>> realizing is that (unlike with bitmap scans, I think) it doesn't
>>> really make a lot of conceptual sense to have index only scans be a
>>> separate node from index scans. Instead it's likely better to consider
>>> it an optimization to index scans that can dynamically kick in when
>>> it's able to be of use. That would allow it to compose with e.g.
>>> prefetching in the aforelinked thread. At the very least we would need
>>> pragmatic (e.g., cost of dynamically applying optimizations) rather
>>> than conceptual reasons to argue they should continue to be separate.
>>>
>>
>> I agree it seems a bit weird to have IOS as a separate node. In a way, I
>> think there are two dimensions for "index-only" scans - which pages can
>> be scanned like that, and which clauses can be evaluated with only the
>> index tuple. The current approach focuses on page visibility, but
>> ignores the other aspect entirely. Or more precisely, it disables IOS
>> entirely as soon as there's a single condition requiring heap tuple.
>>
>> I agree it's probably better to see this as a single node with various
>> optimizations that can be applied when possible / efficient (based on
>> planner and/or dynamically).
>>
>> I'm not sure I see a direct link to the prefetching patch, but it's true
>> that needs to deal with tids (instead of slots), just like IOS. So if
>> the node worked with tids, maybe the prefetching could be done at that
>> level (which I now realize may be what Andres meant by doing prefetching
>> in the executor).
> 
> The link to prefetching is that IOS (as a separate node) won't benefit
> from prefetching (I think) with your current prefetching patch (in the
> case where the VM doesn't allow us to just use the index tuple),
> whereas if the nodes were combined that would more naturally be
> composable.
> 

Yeah, mostly. Although just unifying "regular" indexscans and IOS would
not allow prefetching for IOS.

The reason why the current design does not allow doing prefetching for
IOS is that the prefetching happens deep in indexam.c, and down there we
don't know which TIDs are not from all-visible pages and would need
prefetching. Which is another good reason to do the prefetching at the
executor level, I believe.

>>> Apologies for that lengthy preamble; on to the patch under discussion:
>>>
>>> On Thu, Jun 8, 2023 at 1:34 PM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I took a stab at this and implemented the trick with the VM - during
>>>> index scan, we also extract the filters that only need the indexed
>>>> attributes (just like in IOS). And then, during the execution we:
>>>>
>>>>   1) scan the index using the scan keys (as before)
>>>>
>>>>   2) if the heap page is all-visible, we check the new filters that can
>>>>      be evaluated on the index tuple
>>>>
>>>>   3) fetch the heap tuple and evaluate the filters
>>>
>>> Thanks for working on this; I'm excited about this class of work
>>> (along with index prefetching and other ideas I think there's a lot of
>>> potential for improving index scans).
>>>
>>>> This is pretty much exactly the same thing we do for IOS, so I don't see
>>>> why this would be incorrect while IOS is correct.
>>>>
>>>> This also adds "Index Filter" to explain output, to show which filters
>>>> are executed on the index tuple (at the moment the filters are a subset
>>>> of "Filter"), so if the index tuple matches we'll execute them again on
>>>> the heap tuple. I guess that could be fixed by having two "filter"
>>>> lists, depending on whether we were able to evaluate the index filters.
>>>
>>> Given that we show index filters and heap filters separately it seems
>>> like we might want to maintain separate instrumentation counts of how
>>> many tuple were filtered by each set of filters.
>>>
>>
>> Yeah, separate instrumentation counters would be useful. What I was
>> talking about was more about the conditions itself, because right now we
>> re-evaluate the index-only clauses on the heap tuple.
>>
>> Imagine an index on t(a) and a query that has WHERE (a = 1) AND (b = 2).
>> the patch splits this into two lists:
>>
>> index-only clauses: (a=1)
>> clauses: (a=1) AND (b=1)
>>
>> So we evaluate (a=1) first, and then we fetch the heap tuple and check
>> "clauses" again, which however includes the (a=1) again. For cheap
>> clauses (or when (a=1) eliminated a lot of tuples using just the index),
>> but for expensive clauses it might hurt.
>>
>> It's fixable, we'd just need to keep two versions of the "clauses" list,
>> one for IOS mode (when index-only clauses were checked) and a complete
>> one when we need to check all clauses.
> 
> In some cases (where the VM doesn't allow us to use just the index
> tuple) we'd have to execute both lists against the heap tuple, right?
> 

Not quite. I suspect you imagine we'd have two lists

L1:  (a=1)
L2:  (b=1)

and that for all-visible pages we'd check L1 on index, and then maybe L2
on heap. And for non-all-visible pages we'd check both on heap.

But that doesn't work, because L1 has references to attnums in the index
tuple, while L2 has attnums to heap.

So we'd need

L1:  (a=1)  -> against index tuple
L2:  (b=1)  -> against heap tuple
L3:  (a=1) AND (b=1) -> against heap tuple

And for non-all-visible pages we'd only use L3. (I wonder if we could
check if the tuple is visible and then go back and check L1 on index
tuple, but I doubt it'd be really more efficient.)


>>>> Most of the patch is pretty mechanical - particularly the planning part
>>>> is about identifying filters that can be evaluated on the index tuple,
>>>> and that code was mostly shamelessly copied from index-only scan.
>>>>
>>>> The matching of filters to index is done in check_index_filter(), and
>>>> it's simpler than match_clause_to_indexcol() as it does not need to
>>>> consider operators etc. (I think). But maybe it should be careful about
>>>> other things, not sure.
>>>
>>> This would end up requiring some refactoring of the existing index
>>> matching code (or alternative caching on IndexOptInfo), but
>>> match_filter_to_index() calling check_index_filter() results in
>>> constructs a bitmapset of index columns for every possible filter
>>> which seems wasteful (I recognize this is a bit of a proof-of-concept
>>> level v1).
>>>
>>
>> Probably, I'm sure there's a couple other places where the current API
>> was a bit cumbersome and we could optimize.
>>
>>>> The actual magic happens in IndexNext (nodeIndexscan.c). As mentioned
>>>> earlier, the idea is to check VM and evaluate the filters on the index
>>>> tuple if possible, similar to index-only scans. Except that we then have
>>>> to fetch the heap tuple. Unfortunately, this means the code can't use
>>>> index_getnext_slot() anymore. Perhaps we should invent a new variant
>>>> that'd allow evaluating the index filters in between.
>>>
>>> It does seem there are some refactoring opportunities there.
>>>
>>
>> Actually, I realized maybe we should switch this to index_getnext_tid()
>> because of the prefetching patch. That would allow us to introduce a
>> "buffer" of TIDs, populated by the index_getnext_tid(), and then do
>> prefetching based on that. It's similar to what bitmap scans do, except
>> that intead of the tbm iterator we get items from index_getnext_tid().
>>
>> I haven't tried implementing this yet, but I kinda like the idea as it
>> works no matter what exactly the AM does (i.e. it'd work even for cases
>> like GiST with distance searches).
> 
> Oh, interesting, I'll let you keep chewing on that then.
> 

Cool!

>>>> With the patch applied, the query plan changes from:
>>>>
>>>> ...
>>>>
>>>> to
>>>>
>>>>                                  QUERY PLAN
>>>>     -------------------------------------------------------------------
>>>>      Limit  (cost=0.42..3662.15 rows=1 width=12)
>>>>             (actual time=13.663..13.667 rows=0 loops=1)
>>>>        Buffers: shared hit=544
>>>>        ->  Index Scan using t_a_include_b on t
>>>>            (cost=0.42..3662.15 rows=1 width=12)
>>>>            (actual time=13.659..13.660 rows=0 loops=1)
>>>>              Index Cond: (a > 1000000)
>>>>              Index Filter: (b = 4)
>>>>              Rows Removed by Index Recheck: 197780
>>>>              Filter: (b = 4)
>>>>              Buffers: shared hit=544
>>>>      Planning Time: 0.105 ms
>>>>      Execution Time: 13.690 ms
>>>>     (10 rows)
>>>>
>>>> ...
>>>
>>> I did also confirm that this properly identifies cases Jeff had
>>> mentioned to me like "Index Filter: (((a * 2) > 500000) AND ((b % 10)
>>> = 4))".
>>>
>>
>> Good!
>>
>>> I noticed also you still had questions/TODOs about handling index
>>> scans for join clauses.
>>>
>>
>> Not sure which questions/TODOs you refer to, but I don't recall any
>> issues with join clauses. But maybe I just forgot.
> 
> I was referring to the comment:
> 
> + * FIXME Maybe this should fill the filterset too?
> 
> above match_eclass_clauses_to_index()'s definition.
> 

Ah, yeah. I'm sure there are some loose ends in the matching code.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
Hi,

here's a minor update of the patch, rebased to a current master and
addressing a couple issues reported by cfbot. Most are minor tweaks, but
the last one (4) is a somewhat more serious issue.


1) "tid" might have not been initialized in the IndexNext loop


2) add enable_indexonlyfilter GUC to postgresql.conf.sample (which is
checked by one regression test)


3) accepts a couple plan changes, either switching to index scan (thanks
to the costing changes) or showing the extra index-only filters in the
explain output. The plan changes seem reasonable.


4) problems with opcintype != opckeytype (name_ops)

While running the tests, I ran into an issue with name_ops, causing
failures for \dT and other catalog queries. The root cause is that
name_ops has opcintype = name, but opckeytype = cstring. The index-only
clauses are copied from the table, with Vars mutated to reference the
INDEX_VAR. But the type is not, so when we get to evaluating the
expressions, CheckVarSlotCompatibility() fails because the Var has name,
but the iss_IndexSlot (created with index tuple descriptor) has cstring.

The rebased patch fixes this by explicitly adjusting types of the
descriptor in ExecInitIndexScan().

However, maybe this indicates the very idea of evaluating expressions
using slot with index tuple descriptor is misguided. This made me look
at regular index-only scan (nodeIndexonlyscan.c), and that uses a slot
with the "table" structure, and instead of evaluating the expression on
the index index tuple it expands the index tuple into the table slot.
Which is what StoreIndexTuple() does.

So maybe this should do what IOS does - expand the index tuple into
"table slot" and evaluate the expression on that. That'd also make the
INDEX_VAR tweak in createplan.c unnecessary - in fact, that seemed a bit
strange anyway, so ditching fix_indexfilter_mutator would be good.


However, I wonder if the stuff StoreIndexTuple() is doing is actually
safe. I mean, it's essentially copying values from the index tuple into
the slot, ignoring the type difference. What if opcintype and opckeytype
are not binary compatible? Is it possible to define an opclass with such
opckeytype? I haven't notice any check enforcing such compatibility ...

Also, it's a bit confusing the SGML docs say opckeytype is not supported
for btree, but name_ops clearly does that. Later I found it's actually
mentioned in pg_opclass.dat as a hack, to save space in catalogs.

But then btree also has amstorage=false ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 7/15/23 16:20, Tomas Vondra wrote:
>
> ...
> 
> 4) problems with opcintype != opckeytype (name_ops)
> 
> While running the tests, I ran into an issue with name_ops, causing
> failures for \dT and other catalog queries. The root cause is that
> name_ops has opcintype = name, but opckeytype = cstring. The index-only
> clauses are copied from the table, with Vars mutated to reference the
> INDEX_VAR. But the type is not, so when we get to evaluating the
> expressions, CheckVarSlotCompatibility() fails because the Var has name,
> but the iss_IndexSlot (created with index tuple descriptor) has cstring.
> 
> The rebased patch fixes this by explicitly adjusting types of the
> descriptor in ExecInitIndexScan().
> 
> However, maybe this indicates the very idea of evaluating expressions
> using slot with index tuple descriptor is misguided. This made me look
> at regular index-only scan (nodeIndexonlyscan.c), and that uses a slot
> with the "table" structure, and instead of evaluating the expression on
> the index index tuple it expands the index tuple into the table slot.
> Which is what StoreIndexTuple() does.
> 
> So maybe this should do what IOS does - expand the index tuple into
> "table slot" and evaluate the expression on that. That'd also make the
> INDEX_VAR tweak in createplan.c unnecessary - in fact, that seemed a bit
> strange anyway, so ditching fix_indexfilter_mutator would be good.
> 

This kept bothering me, so I looked at it today, and reworked it to use
the IOS approach. It's a bit more complicated because for IOS both slots
have the same overall structure, except for the data types. But for
regular index scans that's not the case - the code has to "expand" the
index tuple into the larger "table slot". This works, and in general I
think the result is much cleaner - in particular, it means we don't need
to switch the Var nodes to reference the INDEX_VAR.

While experimenting with this I realized again that we're not matching
expressions to IOS. So if you have an expression index on (a+b), that
can't be used even if the query only uses this particular expression.
The same limitation applies to index-only filters, of course. It's not
the fault of this patch, but perhaps it'd be an interesting improvement.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: Use of additional index columns in rows filtering

От
Jeff Davis
Дата:
Hi,


On Sun, 2023-07-16 at 22:36 +0200, Tomas Vondra wrote:
> This kept bothering me, so I looked at it today, and reworked it to
> use
> the IOS approach.

Initial comments on patch 20230716:

* check_index_filter() alredy looks at "canreturn", which should mean
that you don't need to later check for opcintype<>opckeytype. But
there's a comment in IndexNext() indicating that's a problem -- under
what conditions is it a problem?

* (may be a matter of taste) Recomputing the bitmapset from the
canreturn array in check_index_filter() for each call seems awkward. I
would just iterate through the bitmapset and check that all are set
true in the amcanreturn array.

* There are some tiny functions that don't seem to add much value or
have slightly weird APIs. For instance, match_filter_to_index() could
probably just return a boolean, and maybe doesn't even need to exist
because it's such a thin wrapper over check_index_filter(). Similarly
for fix_indexfilter_clause(). I'm OK with tiny functions even if the
only value is a comment, but I didn't find these particularly helpful.

* fix_indexfilter_references() could use a better comment. Perhaps
refactor so that you can share code with fix_indexqual_references()?

* it looks like index filters are duplicated with ordinary filters, is
there a reason for that?

* I'm confused about the relationship of an IOS to an index filter. It
seems like the index filter only works for an ordinary index scan? Why
is that?

Regards,
    Jeff Davis




Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 7/18/23 22:21, Jeff Davis wrote:
> Hi,
> 
> 
> On Sun, 2023-07-16 at 22:36 +0200, Tomas Vondra wrote:
>> This kept bothering me, so I looked at it today, and reworked it to
>> use
>> the IOS approach.
> 
> Initial comments on patch 20230716:
> 
> * check_index_filter() alredy looks at "canreturn", which should mean
> that you don't need to later check for opcintype<>opckeytype. But
> there's a comment in IndexNext() indicating that's a problem -- under
> what conditions is it a problem?
> 

The comment in IndexNext() is a bit obsolete. There was an issue when
using a slot matching the index, because then StoreIndexTuple might fail
because of type mismatch (as explained in [1]). But that's no longer an
issue, thanks to switching to the table slot in the last patch version.

> * (may be a matter of taste) Recomputing the bitmapset from the
> canreturn array in check_index_filter() for each call seems awkward. I
> would just iterate through the bitmapset and check that all are set
> true in the amcanreturn array.
> 

check_index_filter() is a simplified version of check_index_only(), and
that calculates the bitmap this way.

> * There are some tiny functions that don't seem to add much value or
> have slightly weird APIs. For instance, match_filter_to_index() could
> probably just return a boolean, and maybe doesn't even need to exist
> because it's such a thin wrapper over check_index_filter(). Similarly
> for fix_indexfilter_clause(). I'm OK with tiny functions even if the
> only value is a comment, but I didn't find these particularly helpful.
> 

Yes, I agree some of this could be simplified. I only did the bare
minimum to get this bit working.

> * fix_indexfilter_references() could use a better comment. Perhaps
> refactor so that you can share code with fix_indexqual_references()?
> 

I don't think this can share code with fix_indexqual_references(),
because that changes the Var nodes to point to the index (because it
then gets translated to scan keys). The filters don't need that.

> * it looks like index filters are duplicated with ordinary filters, is
> there a reason for that?
> 

Good point. That used to be necessary, because the index-only filters
can be evaluated only on all-visible pages, and filters were had Vars
referencing the index tuple. We'd have to maintain another list of
clauses, which didn't seem worth it.

But now that the filters reference the heap tuple, we could not include
them into the second list.

> * I'm confused about the relationship of an IOS to an index filter. It
> seems like the index filter only works for an ordinary index scan? Why
> is that?

What would it do for IOS? IOS evaluates all filters on the index tuple,
and it does not need the heap tuple at all (assuming allvisible=true).

Index-only filters try to do something like that even for regular index
scans, by evaluating as many expression on the index tuple, but may
still require fetching the heap tuple in the end.


regards


[1]
https://www.postgresql.org/message-id/97985ef2-ef9b-e62e-6fd4-e00a573d4ead@enterprisedb.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Jeff Davis
Дата:
On Wed, 2023-07-19 at 00:36 +0200, Tomas Vondra wrote:
> > * I'm confused about the relationship of an IOS to an index filter.
> > It
> > seems like the index filter only works for an ordinary index scan?
> > Why
> > is that?
>
> What would it do for IOS?

The way it's presented is slightly confusing. If you have table x with
and index on column i, then:

  EXPLAIN (ANALYZE, BUFFERS)
    SELECT i, j FROM x WHERE i = 7 and (i % 1000 = 7);

   Index Scan using x_idx on x  (cost=0.42..8.45 rows=1 width=8)
(actual time=0.094..0.098 rows=1 loops=1)
     Index Cond: (i = 7)
     Index Filter: ((i % 1000) = 7)

But if you remove "j" from the target list, you get:

  EXPLAIN (ANALYZE, BUFFERS)
    SELECT i FROM x WHERE i = 7 and (i % 1000 = 7);

   Index Only Scan using x_idx on x  (cost=0.42..4.45 rows=1 width=4)
(actual time=0.085..0.088 rows=1 loops=1)
      Index Cond: (i = 7)
      Filter: ((i % 1000) = 7)

The confused me at first because the "Filter" in the second plan means
the same thing as the "Index Filter" in the first plan. Should we call
the filter in an IOS an "Index Filter" too? Or is that redundant?

Regards,
    Jeff Davis




Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 7/19/23 01:22, Jeff Davis wrote:
> On Wed, 2023-07-19 at 00:36 +0200, Tomas Vondra wrote:
>>> * I'm confused about the relationship of an IOS to an index filter.
>>> It
>>> seems like the index filter only works for an ordinary index scan?
>>> Why
>>> is that?
>>
>> What would it do for IOS?
> 
> The way it's presented is slightly confusing. If you have table x with
> and index on column i, then:
> 
>   EXPLAIN (ANALYZE, BUFFERS)
>     SELECT i, j FROM x WHERE i = 7 and (i % 1000 = 7);
> 
>    Index Scan using x_idx on x  (cost=0.42..8.45 rows=1 width=8)
> (actual time=0.094..0.098 rows=1 loops=1)
>      Index Cond: (i = 7)
>      Index Filter: ((i % 1000) = 7)
> 
> But if you remove "j" from the target list, you get:
> 
>   EXPLAIN (ANALYZE, BUFFERS)
>     SELECT i FROM x WHERE i = 7 and (i % 1000 = 7);
> 
>    Index Only Scan using x_idx on x  (cost=0.42..4.45 rows=1 width=4)
> (actual time=0.085..0.088 rows=1 loops=1)
>       Index Cond: (i = 7)
>       Filter: ((i % 1000) = 7)
> 
> The confused me at first because the "Filter" in the second plan means
> the same thing as the "Index Filter" in the first plan. Should we call
> the filter in an IOS an "Index Filter" too? Or is that redundant?
> 

I agree the naming in explain is a bit confusing.

I wonder if Andres was right (in the index prefetch thread) that
splitting regular index scans and index-only scans may not be ideal. In
a way, this patch moves those nodes closer, both in capability and code
(because now both use index_getnext_tid etc.).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Jeff Davis
Дата:
On Wed, 2023-07-19 at 11:16 +0200, Tomas Vondra wrote:
> I wonder if Andres was right (in the index prefetch thread) that
> splitting regular index scans and index-only scans may not be ideal.
> In
> a way, this patch moves those nodes closer, both in capability and
> code
> (because now both use index_getnext_tid etc.).

Yeah. I could also imagine decomposing the index scan node into more
pieces, but I don't think it would work out to be a clean data flow.
Either way, probably out of scope for this patch.

For this patch I think we should just tweak the EXPLAIN output so that
it's a little more clear what parts are index-only (at least if VM bit
is set) and what parts need to go to the heap.

Regards,
    Jeff Davis




Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 7/19/23 19:17, Jeff Davis wrote:
> On Wed, 2023-07-19 at 11:16 +0200, Tomas Vondra wrote:
>> I wonder if Andres was right (in the index prefetch thread) that
>> splitting regular index scans and index-only scans may not be ideal.
>> In
>> a way, this patch moves those nodes closer, both in capability and
>> code
>> (because now both use index_getnext_tid etc.).
> 
> Yeah. I could also imagine decomposing the index scan node into more
> pieces, but I don't think it would work out to be a clean data flow.
> Either way, probably out of scope for this patch.
> 

OK

> For this patch I think we should just tweak the EXPLAIN output so that
> it's a little more clear what parts are index-only (at least if VM bit
> is set) and what parts need to go to the heap.
> 

Makes sense, I also need to think about maybe not having duplicate
clauses in the two lists. What annoys me on that it partially prevents
the cost-based reordering done by order_qual_clauses(). So maybe we
should have three lists ... Also, some of the expressions count be
fairly expensive.

BTW could you double-check how I expanded the index_getnext_slot()? I
recall I wasn't entirely confident the result is correct, and I wanted
to try getting rid of the "while (true)" loop.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Jeff Davis
Дата:
On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote:
> Makes sense, I also need to think about maybe not having duplicate
> clauses in the two lists. What annoys me on that it partially
> prevents
> the cost-based reordering done by order_qual_clauses(). So maybe we
> should have three lists ... Also, some of the expressions count be
> fairly expensive.

Can we just calculate the costs of the pushdown and do it when it's a
win? If the random_page_cost savings exceed the costs from evaluating
the clause earlier, then push down.

> BTW could you double-check how I expanded the index_getnext_slot()? I
> recall I wasn't entirely confident the result is correct, and I
> wanted
> to try getting rid of the "while (true)" loop.

I suggest refactoring slightly to have the two loops in different
functions (rather than nested loops in the same function) to make
control flow a bit more clear. I'm not sure if the new function for the
inner loop should be defined in nodeIndexscan.c or indexam.c; I suppose
it depends on how clean the signature looks.

Also please expand the tests a bit to show more EXPLAIN plans that
illustrate the different cases.

Regards,
    Jeff Davis




Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Wed, Jul 19, 2023 at 1:46 PM Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote:
> > Makes sense, I also need to think about maybe not having duplicate
> > clauses in the two lists. What annoys me on that it partially
> > prevents
> > the cost-based reordering done by order_qual_clauses(). So maybe we
> > should have three lists ... Also, some of the expressions count be
> > fairly expensive.
>
> Can we just calculate the costs of the pushdown and do it when it's a
> win? If the random_page_cost savings exceed the costs from evaluating
> the clause earlier, then push down.

My patch that teaches nbtree to execute ScalarArrayOps intelligently
(by dynamically choosing to not re-descend the btree to perform
another "primitive index scan" when the data we need is located on the
same leaf page as the current ScalarArrayOps arrays) took an
interesting turn recently -- one that seems related.

I found that certain kinds of queries are dramatically faster once you
teach the optimizer to accept that multi-column ScalarArrayOps can be
trusted to return tuples in logical/index order, at least under some
circumstances. For example:

pg@regression:5555 [583930]=# create index order_by_saop on
tenk1(two,four,twenty);
CREATE INDEX

pg@regression:5555 [583930]=# EXPLAIN (ANALYZE, BUFFERS)
select ctid, thousand from tenk1
where two in (0,1) and four in (1,2) and twenty in (1,2)
order by two, four, twenty limit 20;

This shows "Buffers: shared hit=1377" on HEAD, versus "Buffers: shared
hit=13" with my patch. All because we can safely terminate the scan
early now. The vast majority of the buffer hits the patch will avoid
are against heap pages, even though I started out with the intention
of eliminating unnecessary repeat index page accesses.

Note that build_index_paths() currently refuses to allow SAOP clauses
to be recognized as ordered with a multi-column index and a query with
a clause for more than the leading column -- that is something that
the patch needs to address (to get this particular improvement, at
least). Allowing such an index path to have useful pathkeys is
typically safe (in the sense that it won't lead to wrong answers to
queries), but we still make a conservative assumption that they can
lead to wrong answers. There are comments about "equality constraints"
that describe the restrictions right now.

But it's not just the question of basic correctness -- the optimizer
is very hesitant to use multi-column SAOPs, even in cases that don't
care about ordering. So it's also (I think, implicitly) a question of
*risk*. The risk of getting very inefficient SAOP execution in nbtree,
when it turns out that a huge number of "primitive index scans" are
needed. But, if nbtree is taught to "coalesce together" primitive
index scans at runtime, that risk goes way down.

Anyway, this seems related to what you're talking about because the
relationship between selectivity and ordering seems particularly
important in this context. And because it suggests that there is at
least some scope for adding "run time insurance" to the executor,
which is valuable in the optimizer if it bounds the potential
downside. If you can be practically certain that there is essentially
zero risk of serious problems when the costing miscalculates (for a
limited subset of cases), then life might be a lot easier -- clearly
we should be biased in one particular direction with a case that has
that kind of general profile.

My current understanding of the optimizer side of things --
particularly things like costing for "filter quals/pqquals" versus
costing for "true index quals" -- is rather basic. That will have to
change. Curious to hear your thoughts (if any) on how what you're
discussing may relate to what I need to do with my patch. Right now my
patch assumes that making SAOP clauses into proper index quals (that
usually preserve index ordering) is an unalloyed good (when safe!).
This assumption is approximately true on average, as far as I can
tell. But it's probably quite untrue in various specific cases, that
somebody is bound to care about.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 7/19/23 23:38, Peter Geoghegan wrote:
> On Wed, Jul 19, 2023 at 1:46 PM Jeff Davis <pgsql@j-davis.com> wrote:
>> On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote:
>>> Makes sense, I also need to think about maybe not having duplicate
>>> clauses in the two lists. What annoys me on that it partially
>>> prevents
>>> the cost-based reordering done by order_qual_clauses(). So maybe we
>>> should have three lists ... Also, some of the expressions count be
>>> fairly expensive.
>>
>> Can we just calculate the costs of the pushdown and do it when it's a
>> win? If the random_page_cost savings exceed the costs from evaluating
>> the clause earlier, then push down.
> 
> My patch that teaches nbtree to execute ScalarArrayOps intelligently
> (by dynamically choosing to not re-descend the btree to perform
> another "primitive index scan" when the data we need is located on the
> same leaf page as the current ScalarArrayOps arrays) took an
> interesting turn recently -- one that seems related.
> 
> I found that certain kinds of queries are dramatically faster once you
> teach the optimizer to accept that multi-column ScalarArrayOps can be
> trusted to return tuples in logical/index order, at least under some
> circumstances. For example:
> 
> pg@regression:5555 [583930]=# create index order_by_saop on
> tenk1(two,four,twenty);
> CREATE INDEX
> 
> pg@regression:5555 [583930]=# EXPLAIN (ANALYZE, BUFFERS)
> select ctid, thousand from tenk1
> where two in (0,1) and four in (1,2) and twenty in (1,2)
> order by two, four, twenty limit 20;
> 
> This shows "Buffers: shared hit=1377" on HEAD, versus "Buffers: shared
> hit=13" with my patch. All because we can safely terminate the scan
> early now. The vast majority of the buffer hits the patch will avoid
> are against heap pages, even though I started out with the intention
> of eliminating unnecessary repeat index page accesses.
> 
> Note that build_index_paths() currently refuses to allow SAOP clauses
> to be recognized as ordered with a multi-column index and a query with
> a clause for more than the leading column -- that is something that
> the patch needs to address (to get this particular improvement, at
> least). Allowing such an index path to have useful pathkeys is
> typically safe (in the sense that it won't lead to wrong answers to
> queries), but we still make a conservative assumption that they can
> lead to wrong answers. There are comments about "equality constraints"
> that describe the restrictions right now.
> 
> But it's not just the question of basic correctness -- the optimizer
> is very hesitant to use multi-column SAOPs, even in cases that don't
> care about ordering. So it's also (I think, implicitly) a question of
> *risk*. The risk of getting very inefficient SAOP execution in nbtree,
> when it turns out that a huge number of "primitive index scans" are
> needed. But, if nbtree is taught to "coalesce together" primitive
> index scans at runtime, that risk goes way down.
> 
> Anyway, this seems related to what you're talking about because the
> relationship between selectivity and ordering seems particularly
> important in this context. And because it suggests that there is at
> least some scope for adding "run time insurance" to the executor,
> which is valuable in the optimizer if it bounds the potential
> downside. If you can be practically certain that there is essentially
> zero risk of serious problems when the costing miscalculates (for a
> limited subset of cases), then life might be a lot easier -- clearly
> we should be biased in one particular direction with a case that has
> that kind of general profile.
> 
> My current understanding of the optimizer side of things --
> particularly things like costing for "filter quals/pqquals" versus
> costing for "true index quals" -- is rather basic. That will have to
> change. Curious to hear your thoughts (if any) on how what you're
> discussing may relate to what I need to do with my patch. Right now my
> patch assumes that making SAOP clauses into proper index quals (that
> usually preserve index ordering) is an unalloyed good (when safe!).
> This assumption is approximately true on average, as far as I can
> tell. But it's probably quite untrue in various specific cases, that
> somebody is bound to care about.
> 

I think the SAOP patch may need to be much more careful about this, but
for this patch it's simpler because it doesn't really change any of the
index internals, or the indexscan in general.

If I simplify that a bit, we're just reordering the clauses in a way to
maybe eliminate the heap fetch. The main risk seems to be that this will
force an expensive qual to the front of the list, just because it can be
evaluated on the index tuple. But the difference would need to be worse
than what we save by not doing the I/O - considering how expensive the
I/O is, that seems unlikely. Could happen for expensive quals that don't
really eliminate many rows, I guess.

Anyway, I see this as extension of what order_qual_clauses() does. The
main issue is that even order_qual_clauses() admits the estimates are
somewhat unreliable, so we can't expect to make perfect decisions.


FWIW: While reading order_qual_clauses() I realized the code may need to
be more careful about leakproof stuff. Essentially, if any of the
non-index clauses is leakproof, we can only do leakproof quals on the index.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Thu, Jul 20, 2023 at 4:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I think the SAOP patch may need to be much more careful about this, but
> for this patch it's simpler because it doesn't really change any of the
> index internals, or the indexscan in general.

It's true that the SAOP patch needs relatively complicated
infrastructure to assess whether or not the technique is safe with a
given set of quals. You cannot safely get an ordered index scan for
something like "select * from table where a < 5 and b in (1,2,3) order
by a, b". With or without my patch. My patch isn't really changing all
that much about the behavior in nbtree, as these things go. It's
*surprising* how little has to change about the high level structure
of index scans, in fact.

(Actually, I'm glossing over a lot. The MDAM paper describes
techniques that'd make even the really challenging cases safe, through
a process of converting quals from conjunctive normal form into
disjunctive normal form. This is more or less the form that the state
machine implemented by _bt_advance_array_keys() produces already,
today. But even with all this practically all of the heavy lifting
takes place before the index scan even begins, during preprocessing --
so you still require surprisingly few changes to index scans
themselves.)

> If I simplify that a bit, we're just reordering the clauses in a way to
> maybe eliminate the heap fetch. The main risk seems to be that this will
> force an expensive qual to the front of the list, just because it can be
> evaluated on the index tuple.

My example query might have been poorly chosen, because it involved a
limit. What I'm thinking of is more general than that.

> But the difference would need to be worse
> than what we save by not doing the I/O - considering how expensive the
> I/O is, that seems unlikely. Could happen for expensive quals that don't
> really eliminate many rows, I guess.

That sounds like the same principle that I was talking about. I think
that it can be pushed quite far, though. I am mostly talking about the
worst case, and it seems like you might not be.

You can easily construct examples where some kind of skew causes big
problems with a multi-column index. I'm thinking of indexes whose
leading columns are low cardinality, and queries where including the
second column as an index qual looks kind of marginal to the
optimizer. Each grouping represented in the most significant index
column might easily have its own unique characteristics; the
distribution of values in subsequent columns might be quite
inconsistent across each grouping, in whatever way.

Since nothing stops a qual on a lower order column having a wildly
different selectivity for one particular grouping, it might not make
sense to say that a problem in this area is due to a bad selectivity
estimate. Even if we have perfect estimates, what good can they do if
the optimal strategy is to vary our strategy at runtime, *within* an
individual index scan, as different parts of the key space (different
groupings) are traversed through? To skip or not to skip, say. This
isn't about picking the cheapest plan, really.

That's another huge advantage of index quals -- they can (at least in
principle) allow you skip over big parts of the index when it just
ends up making sense, in whatever way, for whatever reason. In the
index, and in the heap. Often both. You'd likely always prefer to err
in the direction of having more index quals rather than fewer, when
doing so doesn't substantially change the plan itself. It could be
very cheap insurance, even without any limit. (It would probably also
be a lot faster, but it needn't be.)

> Anyway, I see this as extension of what order_qual_clauses() does. The
> main issue is that even order_qual_clauses() admits the estimates are
> somewhat unreliable, so we can't expect to make perfect decisions.

The attribute value independence assumption is wishful thinking, in no
small part -- it's quite surprising that it works as well as it does,
really.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 7/21/23 05:32, Peter Geoghegan wrote:
> On Thu, Jul 20, 2023 at 4:35 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> I think the SAOP patch may need to be much more careful about this, but
>> for this patch it's simpler because it doesn't really change any of the
>> index internals, or the indexscan in general.
> 
> It's true that the SAOP patch needs relatively complicated
> infrastructure to assess whether or not the technique is safe with a
> given set of quals. You cannot safely get an ordered index scan for
> something like "select * from table where a < 5 and b in (1,2,3) order
> by a, b". With or without my patch. My patch isn't really changing all
> that much about the behavior in nbtree, as these things go. It's
> *surprising* how little has to change about the high level structure
> of index scans, in fact.
> 
> (Actually, I'm glossing over a lot. The MDAM paper describes
> techniques that'd make even the really challenging cases safe, through
> a process of converting quals from conjunctive normal form into
> disjunctive normal form. This is more or less the form that the state
> machine implemented by _bt_advance_array_keys() produces already,
> today. But even with all this practically all of the heavy lifting
> takes place before the index scan even begins, during preprocessing --
> so you still require surprisingly few changes to index scans
> themselves.)
> 

Ah, OK. I was assuming the execution might be more complex. But I was
thinking more about the costing part - if you convert the clauses in
some way, does that affect the reliability of estimates somehow? If the
conversion from AND to OR makes the list of clauses more complex, that
might be an issue ...

The index-only filter does no such conversion, it just uses the same
clauses as before.

>> If I simplify that a bit, we're just reordering the clauses in a way to
>> maybe eliminate the heap fetch. The main risk seems to be that this will
>> force an expensive qual to the front of the list, just because it can be
>> evaluated on the index tuple.
> 
> My example query might have been poorly chosen, because it involved a
> limit. What I'm thinking of is more general than that.
> 

I wasn't really thinking about LIMIT, and I don't think it changes the
overall behavior very much (sure, it's damn difficult to estimate for
skewed data sets, but meh).

The case I had in mind is something like this:

CREATE TABLE t (a int, b int, c int);
CREATE INDEX ON t (a);
INSERT INTO t SELECT i, i, i FROM generate_series(1,1000000) s(i);

SELECT * FROM t WHERE bad_qual(a) AND b < 1 AND c < 1 ORDER BY a;

where bad_qual is expensive and matches almost all rows. Without the
index-only filters, we'd evaluate the conditions in this order

 [b < 1], [c < 1], [bad_qual(a)]

but with naive index-only filters we do this:

 [bad_qual(a)], [b < 1], [c < 1]

which is bad as it runs the expensive thing on every row.

FWIW the "ORDER BY" is necessary, because otherwise we may not even
build the index path (no index keys, no interesting pathkeys). It's just
an opportunistic optimization - if already doing index scan, try doing
this too. I wonder if we should relax that ...

>> But the difference would need to be worse
>> than what we save by not doing the I/O - considering how expensive the
>> I/O is, that seems unlikely. Could happen for expensive quals that don't
>> really eliminate many rows, I guess.
> 
> That sounds like the same principle that I was talking about. I think
> that it can be pushed quite far, though. I am mostly talking about the
> worst case, and it seems like you might not be.
> 

Yeah, I was proposing the usual costing approach, based on "average"
behavior. It has the usual weakness that if the estimates are far off,
we can have issues. There have been various discussions about maybe
considering how reliable the estimates are, to defend against that.
Which is tough to do.

In a way, focusing on the worst case does that by assuming the worst
combination - which is fine, although it may choose the slower (but
safer) approach in some cases.

> You can easily construct examples where some kind of skew causes big
> problems with a multi-column index. I'm thinking of indexes whose
> leading columns are low cardinality, and queries where including the
> second column as an index qual looks kind of marginal to the
> optimizer. Each grouping represented in the most significant index
> column might easily have its own unique characteristics; the
> distribution of values in subsequent columns might be quite
> inconsistent across each grouping, in whatever way.
> 
> Since nothing stops a qual on a lower order column having a wildly
> different selectivity for one particular grouping, it might not make
> sense to say that a problem in this area is due to a bad selectivity
> estimate. Even if we have perfect estimates, what good can they do if
> the optimal strategy is to vary our strategy at runtime, *within* an
> individual index scan, as different parts of the key space (different
> groupings) are traversed through? To skip or not to skip, say. This
> isn't about picking the cheapest plan, really.
> 

Well, yeah. It's one thing to assign some cost estimate to the plan, and
another thing what happens at runtime. It would be nice to be able to
reflect the expected runtime behavior in the cost (otherwise you get
confused users complaining that we pick the "wrong" plan).

> That's another huge advantage of index quals -- they can (at least in
> principle) allow you skip over big parts of the index when it just
> ends up making sense, in whatever way, for whatever reason. In the
> index, and in the heap. Often both. You'd likely always prefer to err
> in the direction of having more index quals rather than fewer, when
> doing so doesn't substantially change the plan itself. It could be
> very cheap insurance, even without any limit. (It would probably also
> be a lot faster, but it needn't be.)
> 

True, although my patch doesn't change the number of index quals. It's
just about the other quals.

>> Anyway, I see this as extension of what order_qual_clauses() does. The
>> main issue is that even order_qual_clauses() admits the estimates are
>> somewhat unreliable, so we can't expect to make perfect decisions.
> 
> The attribute value independence assumption is wishful thinking, in no
> small part -- it's quite surprising that it works as well as it does,
> really.
> 

Yeah. For OLTP it works pretty well, for OLAP not so much.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Fri, Jul 21, 2023 at 4:52 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> > (Actually, I'm glossing over a lot. The MDAM paper describes
> > techniques that'd make even the really challenging cases safe, through
> > a process of converting quals from conjunctive normal form into
> > disjunctive normal form. This is more or less the form that the state
> > machine implemented by _bt_advance_array_keys() produces already,
> > today. But even with all this practically all of the heavy lifting
> > takes place before the index scan even begins, during preprocessing --
> > so you still require surprisingly few changes to index scans
> > themselves.)
> >
>
> Ah, OK. I was assuming the execution might be more complex.

Sort of. Execution of individual "primitive index scans" effectively
works the same way as it does already -- the preprocessing is required
to generate disjunctive primitive index scans that look like one big
index scan when combined (which is how native execution of SAOPs by
nbtree works today).

The challenge during execution of index scans (execution proper, not
preprocessing) comes from processing a "flattened" DNF representation
of your original quals efficiently. If you have (say) 3 SAOPs, then
the total number of distinct DNF quals is the cartesian product of the
3 arrays -- which is multiplicative. But, you can skip over the
single-value DNF quals quickly when they have no matches. Which isn't
all that hard.

We get some very useful invariants with these DNF quals: you can have
at most one individual DNF qual as a match for any individual index
tuple. Plus the quals are materialized in key space order, which is
ideally suited to processing by an ordered scan. So just as you can
use the array keys to skip over parts of the index when searching for
an index tuple, you can use an index tuple to skip over the arrays
when searching for the next relevant set of array keys. It works both
ways!

> But I was
> thinking more about the costing part - if you convert the clauses in
> some way, does that affect the reliability of estimates somehow?

Obviously, it doesn't affect the selectivity at all. That seems most
important (you kinda said the same thing yourself).

> If the
> conversion from AND to OR makes the list of clauses more complex, that
> might be an issue ...

That's definitely a concern. Even still, the biggest problem by far in
this general area is selectivity estimation. Which, in a way, can be
made a lot easier by this general approach.

Let's say we have the tenk1 table, with the same composite index as in
my example upthread (on "(two,four,twenty)"). Further suppose you have
a very simple query: "select count(*) from tenk1". On master (and with
the patch) that's going to give you an index-only scan on the
composite index (assuming it's the only index), which is quite
efficient despite being a full index scan -- 11 buffer hits.

This much more complicated count(*) query is where it gets interesting:

select
  count(*),
  two,
  four,
  twenty
from
  tenk1_dyn_saop
where
  two in (0, 1)
  and four in (1, 2, 3, 4)
  and twenty in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
group by
  two,
  four,
  twenty
order by
  two,
  four,
  twenty;

It's inherently very difficult to predict how selective this query
will be using basic statistics. But maybe it doesn't need to matter so
much, so often.

The patch can execute this with an index-only scan + GroupAggregate.
What ends up happening is that the patch gets 9 buffer hits -- so
pretty close to 11. Master can use almost the same query plan (it uses
quals but needs to hashagg+ sort). It ends up getting 245 buffer hits
-- vastly more than what we see for the full index scan case (and
nothing to do with the sort/an issue with a limit). That's nearly as
many hits as you'd get with a sequential scan. (BTW, I don't need to
coax the query planner to get this result on master.)

With the patch you can vary the predicate in whatever way, so that the
selectivity shifts up or down. Occasionally you'll get maybe one extra
buffer access relative to the base full index scan case, but overall
the patch makes the worst case look very much like a full index scan
(plus some relatively tiny CPU overhead). This is just common sense,
in a way; selectivities are always between 0.0 and 1.0. Why shouldn't
we be able to think about it like that?

> I wasn't really thinking about LIMIT, and I don't think it changes the
> overall behavior very much (sure, it's damn difficult to estimate for
> skewed data sets, but meh).
>
> The case I had in mind is something like this:
>
> CREATE TABLE t (a int, b int, c int);
> CREATE INDEX ON t (a);
> INSERT INTO t SELECT i, i, i FROM generate_series(1,1000000) s(i);
>
> SELECT * FROM t WHERE bad_qual(a) AND b < 1 AND c < 1 ORDER BY a;
>
> where bad_qual is expensive and matches almost all rows.

You must distinguish between quals that can become required scan keys
(or can terminate the scan), and all other quals. This is really
important for my pending SAOP patch, but I think it might be important
here too. I wonder if the best place to address the possibility of
such a regression is in the index AM itself.

Let's make your example a bit more concrete: let's assume that
bad_qual is a very expensive integer comparison, against a column that
has only one possible value. So now your example becomes:

CREATE TABLE t (a expensive_int, b int, c int);
CREATE INDEX ON t (a);
INSERT INTO t SELECT 42, i, i FROM generate_series(1,1000000) s(i);
SELECT * FROM t a in (7, 42) AND b < 1 AND c < 1 ORDER BY a;

(I'm using a SAOP here because the planner will more or less disregard
the ORDER BY if I make it "= 42" instead. Maybe that makes it
simpler.)

Sure, you're getting a full index scan here, and you get all these
useless comparisons on "a" -- that's a real risk. But AFAICT there is
no real need for it. There is another nbtree patch that might help. A
patch that teaches nbtree's _bt_readpage function to skip useless
comparisons like this:

https://postgr.es/m/079c3f8e-3371-abe2-e93c-fc8a0ae3f571@garret.ru

In order for this kind of optimization to be possible at all, we must
be able to reason about "a" as a column whose values will always be in
key space order. That is, nbtree must recognize that "a" is the most
significant key column, not (say) a low-order column from a composite
index -- it's a required column in both directions. If _bt_readpage
can determine that the first tuple on a leaf page has the value "42",
and the high key has that same value, then we can skip all of the
comparisons of "a" for that page, right away (in fact we don't require
any comparisons). Now it doesn't matter that they're super expensive
comparisons (or it hardly matters).

It's natural to think of things like this _bt_readpage optimization as
something that makes existing types of plan shapes run faster. But you
can also think of them as things that make new and fundamentally
better plan shapes feasible, by making risky things much less risky.

> FWIW the "ORDER BY" is necessary, because otherwise we may not even
> build the index path (no index keys, no interesting pathkeys). It's just
> an opportunistic optimization - if already doing index scan, try doing
> this too. I wonder if we should relax that ...

I'm kinda doing the same thing with ordering in my own patch. In
general, even if the query really doesn't care about the index order,
there may be a lot of value in making the nbtree code understand that
this is an ordered index scan. That's what enables skipping, in all
its forms (skipping individual comparisons, skipping whole subsections
of the index, etc).

I'm not saying that this is 100% problem free. But it seems like a
promising high level direction.

> In a way, focusing on the worst case does that by assuming the worst
> combination - which is fine, although it may choose the slower (but
> safer) approach in some cases.

I don't think that it has to be slower on average (even by a tiny
bit). It might just end up being slightly faster on average, and way
faster on occasion.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 7/21/23 21:17, Peter Geoghegan wrote:
> ...
>> But I was
>> thinking more about the costing part - if you convert the clauses in
>> some way, does that affect the reliability of estimates somehow?
> 
> Obviously, it doesn't affect the selectivity at all. That seems most
> important (you kinda said the same thing yourself).
> 

Sorry, I think I meant 'cost estimates', not the selectivity estimates.
If you convert the original "simple" clauses into the more complex list,
presumably we'd cost that differently, right? I may be entirely wrong,
but my intuition is that costing these tiny clauses will be much more
difficult than costing the original clauses.

>> If the
>> conversion from AND to OR makes the list of clauses more complex, that
>> might be an issue ...
> 
> That's definitely a concern. Even still, the biggest problem by far in
> this general area is selectivity estimation. Which, in a way, can be
> made a lot easier by this general approach.
> 
> Let's say we have the tenk1 table, with the same composite index as in
> my example upthread (on "(two,four,twenty)"). Further suppose you have
> a very simple query: "select count(*) from tenk1". On master (and with
> the patch) that's going to give you an index-only scan on the
> composite index (assuming it's the only index), which is quite
> efficient despite being a full index scan -- 11 buffer hits.
> 
> This much more complicated count(*) query is where it gets interesting:
> 
> select
>   count(*),
>   two,
>   four,
>   twenty
> from
>   tenk1_dyn_saop
> where
>   two in (0, 1)
>   and four in (1, 2, 3, 4)
>   and twenty in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
> group by
>   two,
>   four,
>   twenty
> order by
>   two,
>   four,
>   twenty;
> 
> It's inherently very difficult to predict how selective this query
> will be using basic statistics. But maybe it doesn't need to matter so
> much, so often.
> 
> The patch can execute this with an index-only scan + GroupAggregate.
> What ends up happening is that the patch gets 9 buffer hits -- so
> pretty close to 11. Master can use almost the same query plan (it uses
> quals but needs to hashagg+ sort). It ends up getting 245 buffer hits
> -- vastly more than what we see for the full index scan case (and
> nothing to do with the sort/an issue with a limit). That's nearly as
> many hits as you'd get with a sequential scan. (BTW, I don't need to
> coax the query planner to get this result on master.)
> 
> With the patch you can vary the predicate in whatever way, so that the
> selectivity shifts up or down. Occasionally you'll get maybe one extra
> buffer access relative to the base full index scan case, but overall
> the patch makes the worst case look very much like a full index scan
> (plus some relatively tiny CPU overhead). This is just common sense,
> in a way; selectivities are always between 0.0 and 1.0. Why shouldn't
> we be able to think about it like that?
> 

Right, I agree with this reasoning in principle.

But I'm getting a bit lost regarding what's the proposed costing
strategy. It's hard to follow threads spanning days, with various other
distractions, etc.

In principle, I think:

a) If we estimate the scan to return almost everything (or rather if we
expect it to visit almost the whole index), it makes perfect sense to
cost is as a full index scan.

b) What should we do if we expect to read only a fraction of the index?
If we're optimistic, and cost is according to the estimates, but then
end up most of the index, how bad could it be (compared to the optimal
plan choice)? Similarly, if we're pessimistic/defensive and cost it as
full index scan, how many "good" cases would we reject based on the
artificially high cost estimate?

I don't have a very good idea how sensitive the cost is to selectivity
changes, which I think is crucial for making judgments.

>> I wasn't really thinking about LIMIT, and I don't think it changes the
>> overall behavior very much (sure, it's damn difficult to estimate for
>> skewed data sets, but meh).
>>
>> The case I had in mind is something like this:
>>
>> CREATE TABLE t (a int, b int, c int);
>> CREATE INDEX ON t (a);
>> INSERT INTO t SELECT i, i, i FROM generate_series(1,1000000) s(i);
>>
>> SELECT * FROM t WHERE bad_qual(a) AND b < 1 AND c < 1 ORDER BY a;
>>
>> where bad_qual is expensive and matches almost all rows.
> 
> You must distinguish between quals that can become required scan keys
> (or can terminate the scan), and all other quals. This is really
> important for my pending SAOP patch, but I think it might be important
> here too. I wonder if the best place to address the possibility of
> such a regression is in the index AM itself.
> 
> Let's make your example a bit more concrete: let's assume that
> bad_qual is a very expensive integer comparison, against a column that
> has only one possible value. So now your example becomes:
> 
> CREATE TABLE t (a expensive_int, b int, c int);
> CREATE INDEX ON t (a);
> INSERT INTO t SELECT 42, i, i FROM generate_series(1,1000000) s(i);
> SELECT * FROM t a in (7, 42) AND b < 1 AND c < 1 ORDER BY a;
> 
> (I'm using a SAOP here because the planner will more or less disregard
> the ORDER BY if I make it "= 42" instead. Maybe that makes it
> simpler.)
> 
> Sure, you're getting a full index scan here, and you get all these
> useless comparisons on "a" -- that's a real risk. But AFAICT there is
> no real need for it. There is another nbtree patch that might help. A
> patch that teaches nbtree's _bt_readpage function to skip useless
> comparisons like this:
> 
> https://postgr.es/m/079c3f8e-3371-abe2-e93c-fc8a0ae3f571@garret.ru
> 
> In order for this kind of optimization to be possible at all, we must
> be able to reason about "a" as a column whose values will always be in
> key space order. That is, nbtree must recognize that "a" is the most
> significant key column, not (say) a low-order column from a composite
> index -- it's a required column in both directions. If _bt_readpage
> can determine that the first tuple on a leaf page has the value "42",
> and the high key has that same value, then we can skip all of the
> comparisons of "a" for that page, right away (in fact we don't require
> any comparisons). Now it doesn't matter that they're super expensive
> comparisons (or it hardly matters).
> 
> It's natural to think of things like this _bt_readpage optimization as
> something that makes existing types of plan shapes run faster. But you
> can also think of them as things that make new and fundamentally
> better plan shapes feasible, by making risky things much less risky.
> 

That'd be an interesting optimization, but I don't think that matters
for this patch, as it's not messing with index scan keys at all. I mean,
it does not affect what scan keys get passed to the AM at all, or what
scan keys are required. And it does not influence what the AM does. So
all this seems interesting, but rather orthogonal to this patch.


>> FWIW the "ORDER BY" is necessary, because otherwise we may not even
>> build the index path (no index keys, no interesting pathkeys). It's just
>> an opportunistic optimization - if already doing index scan, try doing
>> this too. I wonder if we should relax that ...
> 
> I'm kinda doing the same thing with ordering in my own patch. In
> general, even if the query really doesn't care about the index order,
> there may be a lot of value in making the nbtree code understand that
> this is an ordered index scan. That's what enables skipping, in all
> its forms (skipping individual comparisons, skipping whole subsections
> of the index, etc).
> 
> I'm not saying that this is 100% problem free. But it seems like a
> promising high level direction.
> 

I was rather thinking about maybe relaxing the rules about which index
paths we create, to include indexes that might be interesting thanks to
index-only filters (unless already picked thanks to sorting).

>> In a way, focusing on the worst case does that by assuming the worst
>> combination - which is fine, although it may choose the slower (but
>> safer) approach in some cases.
> 
> I don't think that it has to be slower on average (even by a tiny
> bit). It might just end up being slightly faster on average, and way
> faster on occasion.
> 

OK, that'd be nice. I don't have a very good intuition about behavior
for these queries, I'd need to play with & benchmark it the way I did
for the prefetching patch etc.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Sun, Jul 23, 2023 at 5:04 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Sorry, I think I meant 'cost estimates', not the selectivity estimates.
> If you convert the original "simple" clauses into the more complex list,
> presumably we'd cost that differently, right? I may be entirely wrong,
> but my intuition is that costing these tiny clauses will be much more
> difficult than costing the original clauses.

I think that that's definitely true (it is more difficult), but that
there may be a bigger picture.

> Right, I agree with this reasoning in principle.
>
> But I'm getting a bit lost regarding what's the proposed costing
> strategy.

To be clear, I don't have a clue how to better estimate the
cardinality of multiple attributes from a composite index. The big and
immediate change to the SAOP costing with my patch is that
genericcostestimate/btcostestimate can safely assume that visiting
each leaf page more than once is a physical impossibility. Because it
is. It is no longer necessary to treat SAOPs similarly to a nested
loop join during costing, which is how it works today.

Now, whenever you add increasingly complicated clauses to a
multi-column SAOP query (like the ones I've shown you), it makes sense
for the cost to "saturate" at a certain point. That should be
representative of the physical reality, for both CPU costs and I/O
costs. Right now the worst case is really relevant to the average
case, since the risk of the costs just exploding at runtime is very
real.

If the only problem in this area was the repeated accesses to the same
leaf pages (accesses that happen in very close succession anyway),
then all of this would be a nice win, but not much more. It certainly
wouldn't be expected to change the way we think about stuff that isn't
directly and obviously relevant. But, it's not just the index pages.
Once you start to consider the interactions with filter/qpquals, it
gets much more interesting. Now you're talking about completely
avoiding physical I/Os for heap accesses, which has the potential to
make a dramatic difference to some types of queries, particularly in
the worst case.

> It's hard to follow threads spanning days, with various other
> distractions, etc.

I have to admit that my thinking on this very high level stuff is
rather unrefined. As much as anything else, I'm trying to invent (or
discover) a shared vocabulary for discussing these issues. I might
have gone about it clumsily, at times. I appreciate being able to
bounce this stuff off you.

> I don't have a very good idea how sensitive the cost is to selectivity
> changes, which I think is crucial for making judgments.

I'm not trying to find a way for the optimizer to make better
judgments about costing with a multi-column index. What I'm suggesting
(rather tentatively) is to find a way for it to make fewer (even no)
judgements at all.

If you can find a way of reducing the number of possible choices
without any real downside -- in particular by just not producing index
paths that cannot possibly be a win in some cases -- then you reduce
the number of bad choices. The challenge is making that kind of
approach in the optimizer actually representative of the executor in
the real world. The executor has to have robust performance under a
variety of runtime conditions for any of this to make sense.

> > It's natural to think of things like this _bt_readpage optimization as
> > something that makes existing types of plan shapes run faster. But you
> > can also think of them as things that make new and fundamentally
> > better plan shapes feasible, by making risky things much less risky.
> >
>
> That'd be an interesting optimization, but I don't think that matters
> for this patch, as it's not messing with index scan keys at all. I mean,
> it does not affect what scan keys get passed to the AM at all, or what
> scan keys are required. And it does not influence what the AM does. So
> all this seems interesting, but rather orthogonal to this patch.

Your patch is approximately the opposite of what I'm talking about, in
terms of its structure. The important commonality is that each patch
adds "superfluous" quals that can be very useful at runtime, under the
right circumstances -- which can be hard to predict. Another
similarity is that both patches inspire some of the same kind of
lingering doubts about extreme cases -- cases where (somehow) the
usual/expected cost asymmetry that usually works in our favor doesn't
apply.

My current plan is to post v1 of my patch early next week. It would be
better to discuss this on the thread that I create for that patch.

You're right that "exploiting index ordering" on the index AM side is
totally unrelated to your patch. Sorry about that.

> I was rather thinking about maybe relaxing the rules about which index
> paths we create, to include indexes that might be interesting thanks to
> index-only filters (unless already picked thanks to sorting).

That seems like it would make sense. In general I think that we
overuse and over rely on bitmap index scans -- we should try to chip
away at the artificial advantages that bitmap index scans have, so
that we can get the benefit of index scans more often -- accessing the
heap/VM inline does open up a lot of possibilities that bitmap scans
will never be able to match. (BTW, I'm hoping that your work on index
prefetching will help with that.)

I see that your patch has this diff change (which is 1 out of only 2
or 3 plan changes needed by the patch):

--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1838,18 +1838,13 @@ DROP TABLE onek_with_null;
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);
-                                                               QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
- Bitmap Heap Scan on tenk1
-   Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand
= 42) AND (tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42)))
-   ->  BitmapOr
-         ->  Bitmap Index Scan on tenk1_thous_tenthous
-               Index Cond: ((thousand = 42) AND (tenthous = 1))
-         ->  Bitmap Index Scan on tenk1_thous_tenthous
-               Index Cond: ((thousand = 42) AND (tenthous = 3))
-         ->  Bitmap Index Scan on tenk1_thous_tenthous
-               Index Cond: ((thousand = 42) AND (tenthous = 42))
-(9 rows)
+                              QUERY PLAN
+-----------------------------------------------------------------------
+ Index Scan using tenk1_thous_tenthous on tenk1
+   Index Cond: (thousand = 42)
+   Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
+   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
+(4 rows)

That does seem like an improvement to me. But, an even better plan is
possible. Or would be possible once this SAOP-OR-transformation patch
is in place (if it was then combined with my own SAOP patch):


https://www.postgresql.org/message-id/flat/919bfbcb-f812-758d-d687-71f89f0d9a68%40postgrespro.ru#9d877caf48c4e331e507b5c63914228e

That could give us an index scan plan that is perfectly efficient.
Like the new plan shown here, it would pin/lock a single leaf page
from the tenk1_thous_tenthous index, once. But unlike the plan shown
here, it would be able to terminate as soon as the index scan reached
an index tuple where thousand=42 and tenthous>42. That makes a
significant difference if we have to do heap accesses for those extra
tuples.

Plus this hypothetical other plan of mine would be more robust: it
would tolerate misestimates. It happens to be the case that there just
aren't that many tuples with thousand=42 -- they take up only a
fraction of one leaf page. But...why take a chance on that being true,
if we don't have to? The old bitmap index scan plan has this same
advantage already -- it is robust in the very same way. Because it
managed to pass down specific "tenthous" index quals. It would be nice
to have both advantages, together. (To be clear, I'm certainly not
suggesting that the example argues against what you want to do here --
it's just an example that jumped out at me.)

Perhaps this example will make my confusion about the boundaries
between each of our patches a bit more understandable. I was confused
-- and I still am. I look forward to being less confused at some point
in the future.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Jeff Davis
Дата:
On Sun, 2023-07-23 at 14:04 +0200, Tomas Vondra wrote:
> But I'm getting a bit lost regarding what's the proposed costing
> strategy. It's hard to follow threads spanning days, with various
> other
> distractions, etc.

I'm getting a bit lost in this discussion as well -- for the purposes
of this feature, we only need to know whether to push down a clause as
an Index Filter or not, right?

Could we start out conservatively and push down as an Index Filter
unless there is some other clause ahead of it that can't be pushed
down? That would allow users to have some control by writing clauses in
the desired order or wrapping them in functions with a declared cost.

Regards,
    Jeff Davis




Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Mon, Jul 24, 2023 at 10:36 AM Jeff Davis <pgsql@j-davis.com> wrote:
> I'm getting a bit lost in this discussion as well -- for the purposes
> of this feature, we only need to know whether to push down a clause as
> an Index Filter or not, right?

I think so.

> Could we start out conservatively and push down as an Index Filter
> unless there is some other clause ahead of it that can't be pushed
> down? That would allow users to have some control by writing clauses in
> the desired order or wrapping them in functions with a declared cost.

I'm a bit concerned about cases like the one I described from the
regression tests.

The case in question shows a cheaper plan replacing a more expensive
plan -- so it's a win by every conventional measure. But, the new plan
is less robust in the sense that I described yesterday: it will be
much slower than the current plan when there happens to be many more
"thousand = 42" tuples than expected. We have a very high chance of a
small benefit (we save repeated index page accesses), but a very low
chance of a high cost (we incur many more heap accesses). Which seems
less than ideal.

One obvious way of avoiding that problem (that's probably overly
conservative) is to just focus on the original complaint from Maxim.
The original concern was limited to non-key columns from INCLUDE
indexes. If you only apply the optimization there then you don't run
the risk of generating a path that "out competes" a more robust path
in the sense that I've described. This is obviously true because there
can't possibly be index quals/scan keys for non-key columns within the
index AM.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Mon, Jul 24, 2023 at 10:36 AM Jeff Davis <pgsql@j-davis.com> wrote:
>> Could we start out conservatively and push down as an Index Filter
>> unless there is some other clause ahead of it that can't be pushed
>> down? That would allow users to have some control by writing clauses in
>> the desired order or wrapping them in functions with a declared cost.

> I'm a bit concerned about cases like the one I described from the
> regression tests.

Please do not put in any code that assumes that restriction clause
order is preserved, or encourages users to think it is.  There are
already cases where that's not so, eg equivalence clauses tend to
get shuffled around.

            regards, tom lane



Re: Use of additional index columns in rows filtering

От
Jeff Davis
Дата:
On Mon, 2023-07-24 at 13:58 -0400, Tom Lane wrote:
> Please do not put in any code that assumes that restriction clause
> order is preserved, or encourages users to think it is.

Agreed. I didn't mean to add any extra guarantee of preserving clause
order; just to follow the current way order_qual_clauses() works, which
has a comment saying:

"So we just order by security level then estimated per-tuple cost,
being careful not to change the order when (as is often the case) the
estimates are identical."

I assumed that the reason for "being careful" above was to not
unnecessarily override how the user writes the qual clauses, but
perhaps there's another reason?

Regardless, my point was just to make minimal changes now that are
unlikely to cause regressions. If we come up with better ways of
ordering the clauses later, that could be part of a separate change. (I
think Peter G. is pointing out a complication with that idea, to which
I'll respond separately.)

Regards,
    Jeff Davis




Re: Use of additional index columns in rows filtering

От
Jeff Davis
Дата:
On Mon, 2023-07-24 at 10:54 -0700, Peter Geoghegan wrote:
> The case in question shows a cheaper plan replacing a more expensive
> plan -- so it's a win by every conventional measure. But, the new
> plan
> is less robust in the sense that I described yesterday: it will be
> much slower than the current plan when there happens to be many more
> "thousand = 42" tuples than expected. We have a very high chance of a
> small benefit (we save repeated index page accesses), but a very low
> chance of a high cost (we incur many more heap accesses). Which seems
> less than ideal.

I see. You're concerned that lowering the cost of an index scan path
too much, due to pushing down a clause as an Index Filter, could cause
it to out-compete a more "robust" plan.

That might be true but I'm not sure what to do about that unless we
incorporate some "robustness" measure into the costing. If every
measure we have says one plan is better, don't we have to choose it?

> The original concern was limited to non-key columns from INCLUDE
> indexes. If you only apply the optimization there then you don't run
> the risk of generating a path that "out competes" a more robust path
> in the sense that I've described. This is obviously true because
> there
> can't possibly be index quals/scan keys for non-key columns within
> the
> index AM.

If I understand correctly, you mean we couldn't use an Index Filter on
a key column? That seems overly restrictive, there are plenty of
clauses that might be useful as an Index Filter but cannot be an Index
Cond for one reason or another.

Regards,
    Jeff Davis




Re: Use of additional index columns in rows filtering

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> On Mon, 2023-07-24 at 13:58 -0400, Tom Lane wrote:
>> Please do not put in any code that assumes that restriction clause
>> order is preserved, or encourages users to think it is.

> Agreed. I didn't mean to add any extra guarantee of preserving clause
> order; just to follow the current way order_qual_clauses() works, which
> has a comment saying:
> "So we just order by security level then estimated per-tuple cost,
> being careful not to change the order when (as is often the case) the
> estimates are identical."
> I assumed that the reason for "being careful" above was to not
> unnecessarily override how the user writes the qual clauses, but
> perhaps there's another reason?

I think the point was just to not make any unnecessary behavioral
changes from the way things were before we added that sorting logic.
But there are other places that will result in clause ordering changes,
plus there's the whole business of possibly-intermixed restriction and
join clauses.

            regards, tom lane



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Mon, Jul 24, 2023 at 11:37 AM Jeff Davis <pgsql@j-davis.com> wrote:
> I see. You're concerned that lowering the cost of an index scan path
> too much, due to pushing down a clause as an Index Filter, could cause
> it to out-compete a more "robust" plan.

The optimizer correctly determines that 3 index scans (plus a bitmap
OR node) are more expensive than 1 very similar index scan. It's hard
to argue with that.

> That might be true but I'm not sure what to do about that unless we
> incorporate some "robustness" measure into the costing. If every
> measure we have says one plan is better, don't we have to choose it?

I'm mostly concerned about the possibility itself -- it's not a matter
of tuning the cost. I agree that that approach would probably be
hopeless.

There is a principled (albeit fairly involved) way of addressing this.
The patch allows the optimizer to produce a plan that has 1 index
scan, that treats the first column as an index qual, and the second
column as a filter condition. There is no fundamental reason why we
can't just have 1 index scan that makes both columns into index quals
(instead of 3 highly duplicative variants of the same index scan).
That's what I'm working towards right now.

> If I understand correctly, you mean we couldn't use an Index Filter on
> a key column? That seems overly restrictive, there are plenty of
> clauses that might be useful as an Index Filter but cannot be an Index
> Cond for one reason or another.

I think that you're probably right about it being overly restrictive
-- that was just a starting point for discussion. Perhaps there is an
identifiable class of clauses that can benefit, but don't have the
downside that I'm concerned about.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Mon, Jul 24, 2023 at 11:59 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > That might be true but I'm not sure what to do about that unless we
> > incorporate some "robustness" measure into the costing. If every
> > measure we have says one plan is better, don't we have to choose it?
>
> I'm mostly concerned about the possibility itself -- it's not a matter
> of tuning the cost. I agree that that approach would probably be
> hopeless.

This seems related to the fact that EXPLAIN doesn't expose the
difference between what Markus Winand calls "Access Predicates" and
"Index Filter Predicates", as explained here:

https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates

That is, both "Access Predicates" and "Index Filter Predicates" are
shown after an "Index Cond: " entry in Postgres EXPLAIN output, in
general. Even though these are two very different things. I believe
that the underlying problem for the implementation (the reason why we
can't easily break this out further in EXPLAIN output) is that we
don't actually know what kind of predicate it is ourselves -- at least
not until execution time. We wait until then to do nbtree
preprocessing/scan setup. Though perhaps we should do more of this
during planning instead [1], for several reasons (fixing this is just
one of those reasons).

The risk to "robustness" for cases like the one I drew attention to on
this thread would probably have been obvious all along if EXPLAIN
output were more like what Markus would have us do -- he certainly has
a good point here, in general.

Breaking things out in EXPLAIN output along these lines might also
give us a better general sense of when a similar plan shift like this
was actually okay -- even according to something like my
non-traditional "query robustness" criteria. It's much harder for me
to argue that a shift in plans from what Markus calls an "Index Filter
Predicate" to what the patch will show under "Index Filter:" is a
novel new risk. That would be a much less consequential difference,
because those two things are fairly similar anyway.

Besides, such a shift in plan would have to "legitimately win" for
things to work out like this. If we're essentially picking between two
different subtypes of "Index Filter Predicate", then there can't be
the same weird second order effects that we see when an "Access
Predicate" is out-competed by an "Index Filter Predicate". It's
possible that expression evaluation of a small-ish conjunctive
predicate like "Index Filter: ((tenthous = 1) OR (tenthous = 3) OR
(tenthous = 42))" will be faster than a natively executed SAOP. You
can't do that kind of expression evaluation in the index AM itself
(assuming that there is an opclass for nbtree to use in the first
place, which there might not be in the case of any non-key INCLUDE
columns). With the patch, you can do all this. And I think that you
can derisk it without resorting to the overly conservative approach of
limiting ourselves to non-key columns from INCLUDE indexes.

To summarize: As Markus says on the same page. "Index filter
predicates give a false sense of safety; even though an index is used,
the performance degrades rapidly on a growing data volume or system
load". That's essentially what I want to avoid here. I'm much less
concerned about competition between what are really "Index Filter
Predicate" subtypes. Allowing that competition to take place is not
entirely risk-free, of course, but it seems about as risky as anything
else in this area.

[1] https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us
--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/2/23 02:50, Peter Geoghegan wrote:
> On Mon, Jul 24, 2023 at 11:59 AM Peter Geoghegan <pg@bowt.ie> wrote:
>>> That might be true but I'm not sure what to do about that unless we
>>> incorporate some "robustness" measure into the costing. If every
>>> measure we have says one plan is better, don't we have to choose it?
>>
>> I'm mostly concerned about the possibility itself -- it's not a matter
>> of tuning the cost. I agree that that approach would probably be
>> hopeless.
> 
> This seems related to the fact that EXPLAIN doesn't expose the
> difference between what Markus Winand calls "Access Predicates" and
> "Index Filter Predicates", as explained here:
> 
> https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates
> 
> That is, both "Access Predicates" and "Index Filter Predicates" are
> shown after an "Index Cond: " entry in Postgres EXPLAIN output, in
> general. Even though these are two very different things. I believe
> that the underlying problem for the implementation (the reason why we
> can't easily break this out further in EXPLAIN output) is that we
> don't actually know what kind of predicate it is ourselves -- at least
> not until execution time. We wait until then to do nbtree
> preprocessing/scan setup. Though perhaps we should do more of this
> during planning instead [1], for several reasons (fixing this is just
> one of those reasons).
> 

How come we don't know that until the execution time? Surely when
building the paths/plans, we match the clauses to the index keys, no? Or
are you saying that just having a scan key is not enough for it to be
"access predicate"?

Anyway, this patch is mostly about "Index Cond" mixing two types of
predicates. But the patch is really about "Filter" predicates - moving
some of them from table to index. So quite similar to the "index filter
predicates" except that those are handled by the index AM.

> The risk to "robustness" for cases like the one I drew attention to on
> this thread would probably have been obvious all along if EXPLAIN
> output were more like what Markus would have us do -- he certainly has
> a good point here, in general.
> 
> Breaking things out in EXPLAIN output along these lines might also
> give us a better general sense of when a similar plan shift like this
> was actually okay -- even according to something like my
> non-traditional "query robustness" criteria. It's much harder for me
> to argue that a shift in plans from what Markus calls an "Index Filter
> Predicate" to what the patch will show under "Index Filter:" is a
> novel new risk. That would be a much less consequential difference,
> because those two things are fairly similar anyway.
> 

But differentiating between access and filter predicates (at the index
AM level) seems rather independent of what this patch aims to do.

FWIW I agree we should make the differences visible in the explain. That
seems fairly useful for non-trivial index access paths, and it does not
change the execution at all. I think it'd be fine to do that only for
VERBOSE mode, and only for EXPLAIN ANALYZE (if we only do this at
execution time for now).

> Besides, such a shift in plan would have to "legitimately win" for
> things to work out like this. If we're essentially picking between two
> different subtypes of "Index Filter Predicate", then there can't be
> the same weird second order effects that we see when an "Access
> Predicate" is out-competed by an "Index Filter Predicate". It's
> possible that expression evaluation of a small-ish conjunctive
> predicate like "Index Filter: ((tenthous = 1) OR (tenthous = 3) OR
> (tenthous = 42))" will be faster than a natively executed SAOP. You
> can't do that kind of expression evaluation in the index AM itself
> (assuming that there is an opclass for nbtree to use in the first
> place, which there might not be in the case of any non-key INCLUDE
> columns). With the patch, you can do all this. And I think that you
> can derisk it without resorting to the overly conservative approach of
> limiting ourselves to non-key columns from INCLUDE indexes.
> 

I'm not following. Why couldn't there be some second-order effects?
Maybe it's obvious / implied by something said earlier, but I think
every time we decide between multiple choices, there's a risk of picking
wrong.

Anyway, is this still about this patch or rather about your SAOP patch?

> To summarize: As Markus says on the same page. "Index filter
> predicates give a false sense of safety; even though an index is used,
> the performance degrades rapidly on a growing data volume or system
> load". That's essentially what I want to avoid here. I'm much less
> concerned about competition between what are really "Index Filter
> Predicate" subtypes. Allowing that competition to take place is not
> entirely risk-free, of course, but it seems about as risky as anything
> else in this area.
> 

True. IMHO the danger or "index filter" predicates is that people assume
index conditions eliminate large parts of the index - which is not
necessarily the case here. If we can improve this, cool.

But again, this is not what this patch does, right? It's about moving
stuff from "table filter" to "index filter". And those clauses were not
matched to the index AM at all, so it's not really relevant to the
discussion about different subtypes of predicates.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Wed, Aug 2, 2023 at 6:48 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> How come we don't know that until the execution time? Surely when
> building the paths/plans, we match the clauses to the index keys, no? Or
> are you saying that just having a scan key is not enough for it to be
> "access predicate"?

In principle we can and probably should recognize the difference
between "Access Predicates" and "Index Filter Predicates" much earlier
on, during planning. Probably when index paths are first built.

It seems quite likely that there will turn out to be at least 2 or 3
reasons to do this. The EXPLAIN usability issue might be enough of a
reason on its own, though.

> Anyway, this patch is mostly about "Index Cond" mixing two types of
> predicates. But the patch is really about "Filter" predicates - moving
> some of them from table to index. So quite similar to the "index filter
> predicates" except that those are handled by the index AM.

I understand that that's how the patch is structured. It is
nevertheless possible (as things stand) that the patch will make the
planner shift from a plan that uses "Access Predicates" to the maximum
extent possible when scanning a composite index, to a similar plan
that has a similar index scan, for the same index, but with fewer
"Access Predicates" in total. In effect, the patched planner will swap
one type of predicate for another type because doing so enables the
executor to scan an index once instead of scanning it several times.

I don't dispute the fact that this can only happen when the planner
believes (with good reason) that the expected cost will be lower. But
I maintain that there is a novel risk to be concerned about, which is
meaningfully distinct from the general risk of regressions that comes
from making just about any change to the planner. The important
principle here is that we should have a strong bias in the direction
of making quals into true "Access Predicates" whenever practical.

Yeah, technically the patch doesn't directly disrupt how existing
index paths get generated. But it does have the potential to disrupt
it indirectly, by providing an alternative very-similar index path
that's likely to outcompete the existing one in these cases. I think
that we should have just one index path that does everything well
instead.

> But differentiating between access and filter predicates (at the index
> AM level) seems rather independent of what this patch aims to do.

My concern is directly related to the question of "access predicates
versus filter predicates", and the planner's current ignorance on
which is which. The difference may not matter too much right now, but
ISTM that your patch makes it matter a lot more. And so in that
indirect sense it does seem relevant.

The planner has always had a strong bias in the direction of making
clauses that can be index quals into index quals, rather than filter
predicates. It makes sense to do that even when they aren't very
selective. This is a similar though distinct principle.

It's awkward to discuss the issue, since we don't really have official
names for these things just yet (I'm just going with what Markus calls
them for now). And because there is more than one type of "index
filter predicate" in play with the patch (namely those in the index
AM, and those in the index scan executor node). But my concern boils
down to access predicates being strictly better than equivalent index
filter predicates.

> I'm not following. Why couldn't there be some second-order effects?
> Maybe it's obvious / implied by something said earlier, but I think
> every time we decide between multiple choices, there's a risk of picking
> wrong.

Naturally, I agree that some amount of risk is inherent. I believe
that the risk I have identified is qualitatively different to the
standard, inherent kind of risk.

> Anyway, is this still about this patch or rather about your SAOP patch?

It's mostly not about my SAOP patch. It's just that my SAOP patch will
do exactly the right thing in this particular case (at least once
combined with Alena Rybakina's OR-to-SAOP transformation patch). It is
therefore quite clear that a better plan is possible in principle.
Clearly we *can* have the benefit of only one single index scan (i.e.
no BitmapOr node combining multiple similar index scans), without
accepting any novel new risk to get that benefit. We should just have
one index path that does it all, while avoiding duplicative index
paths that embody a distinction that really shouldn't exist -- it
should be dealt with at runtime instead.

> True. IMHO the danger or "index filter" predicates is that people assume
> index conditions eliminate large parts of the index - which is not
> necessarily the case here. If we can improve this, cool.

I think that we'd find a way to use the same information in the
planner, too. It's not just users that should care about the
difference. And I don't think that it's particularly likely to be
limited to SAOP/MDAM stuff. As Markus said, we're talking about an
important general principle here.

> But again, this is not what this patch does, right? It's about moving
> stuff from "table filter" to "index filter". And those clauses were not
> matched to the index AM at all, so it's not really relevant to the
> discussion about different subtypes of predicates.

I understand that what I've said is not particularly helpful. At least
not on its own. You quite naturally won't want to tie the fate of this
patch to my SAOP patch, which is significantly more complicated. I do
think that my concern about this being a novel risk needs to be
carefully considered.

Maybe it's possible to address my concern outside of the context of my
own SAOP patch. That would definitely be preferable all around. But
"access predicates versus filter predicates" seems important and
relevant, either way.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I don't dispute the fact that this can only happen when the planner
> believes (with good reason) that the expected cost will be lower. But
> I maintain that there is a novel risk to be concerned about, which is
> meaningfully distinct from the general risk of regressions that comes
> from making just about any change to the planner. The important
> principle here is that we should have a strong bias in the direction
> of making quals into true "Access Predicates" whenever practical.
>
> Yeah, technically the patch doesn't directly disrupt how existing
> index paths get generated. But it does have the potential to disrupt
> it indirectly, by providing an alternative very-similar index path
> that's likely to outcompete the existing one in these cases. I think
> that we should have just one index path that does everything well
> instead.

You can see this for yourself, quite easily. Start by running the
relevant query from the regression tests, which is:

SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous
= 3 OR tenthous = 42);

EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query
slightly faster, as expected. I see 7 buffer hits for the bitmap index
scan plan on master, versus only 4 buffer hits for the patch's index
scan. Obviously, this is because we go from multiple index scans
(multiple bitmap index scans) to only one.

But, if I run this insert statement and try the same thing again,
things look very different:

insert into tenk1 (thousand, tenthous) select 42, i from
generate_series(43, 1000) i;

(Bear in mind that we've inserted rows that don't actually need to be
selected by the query in question.)

Now the master branch's plan works in just the same way as before --
it has exactly the same overhead (7 buffer hits). Whereas the patch
still gets the same risky plan -- which now blows up. The plan now
loses by far more than it could ever really hope to win by: 336 buffer
hits. (It could be a lot higher than this, even, but you get the
picture.)

Sure, it's difficult to imagine a very general model that captures
this sort of risk, in the general case. But you don't need a degree in
actuarial science to understand that it's inherently a bad idea to
juggle chainsaws -- no matter what your general risk tolerance happens
to be. Some things you just don't do.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/3/23 07:26, Peter Geoghegan wrote:
> On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> I don't dispute the fact that this can only happen when the planner
>> believes (with good reason) that the expected cost will be lower. But
>> I maintain that there is a novel risk to be concerned about, which is
>> meaningfully distinct from the general risk of regressions that comes
>> from making just about any change to the planner. The important
>> principle here is that we should have a strong bias in the direction
>> of making quals into true "Access Predicates" whenever practical.
>>

OK, preference for access predicates sounds like a reasonable principle.

>> Yeah, technically the patch doesn't directly disrupt how existing
>> index paths get generated. But it does have the potential to disrupt
>> it indirectly, by providing an alternative very-similar index path
>> that's likely to outcompete the existing one in these cases. I think
>> that we should have just one index path that does everything well
>> instead.
> 
> You can see this for yourself, quite easily. Start by running the
> relevant query from the regression tests, which is:
> 
> SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous
> = 3 OR tenthous = 42);
> 
> EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query
> slightly faster, as expected. I see 7 buffer hits for the bitmap index
> scan plan on master, versus only 4 buffer hits for the patch's index
> scan. Obviously, this is because we go from multiple index scans
> (multiple bitmap index scans) to only one.
> 
> But, if I run this insert statement and try the same thing again,
> things look very different:
> 
> insert into tenk1 (thousand, tenthous) select 42, i from
> generate_series(43, 1000) i;
> 
> (Bear in mind that we've inserted rows that don't actually need to be
> selected by the query in question.)
> 
> Now the master branch's plan works in just the same way as before --
> it has exactly the same overhead (7 buffer hits). Whereas the patch
> still gets the same risky plan -- which now blows up. The plan now
> loses by far more than it could ever really hope to win by: 336 buffer
> hits. (It could be a lot higher than this, even, but you get the
> picture.)

Are you sure? Because if I try this on master (62e9af4c without any
patches), I get this:

regression=# explain (verbose, analyze, buffers) SELECT * FROM tenk1
WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);

                              QUERY PLAN
------------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on public.tenk1
(cost=0.29..1416.32 rows=1 width=244) (actual time=0.078..1.361 rows=1
loops=1)
   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
   Index Cond: (tenk1.thousand = 42)
   Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Rows Removed by Filter: 967
   Buffers: shared hit=335
 Planning Time: 0.225 ms
 Execution Time: 1.430 ms
(8 rows)

So not sure about the claim that master works fine as before. OTOH, the
patched branch (with "my" patch 2023/07/16, just to be clear) does this:

                              QUERY PLAN
------------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on public.tenk1
(cost=0.29..23.57 rows=1 width=244) (actual time=0.077..0.669 rows=1
loops=1)
   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
   Index Cond: (tenk1.thousand = 42)
   Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Rows Removed by Index Recheck: 967
   Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Buffers: shared hit=7
 Planning Time: 0.211 ms
 Execution Time: 0.722 ms
(9 rows)

Which is just the 7 buffers ...

Did I do something wrong?

> 
> Sure, it's difficult to imagine a very general model that captures
> this sort of risk, in the general case. But you don't need a degree in
> actuarial science to understand that it's inherently a bad idea to
> juggle chainsaws -- no matter what your general risk tolerance happens
> to be. Some things you just don't do.
> 

Oh come on. I've been juggling chainsaws (lit on fire!) since I was a
little boy! There's nothing risky about it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/3/23 03:32, Peter Geoghegan wrote:
> On Wed, Aug 2, 2023 at 6:48 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> How come we don't know that until the execution time? Surely when
>> building the paths/plans, we match the clauses to the index keys, no? Or
>> are you saying that just having a scan key is not enough for it to be
>> "access predicate"?
> 
> In principle we can and probably should recognize the difference
> between "Access Predicates" and "Index Filter Predicates" much earlier
> on, during planning. Probably when index paths are first built.
> 
> It seems quite likely that there will turn out to be at least 2 or 3
> reasons to do this. The EXPLAIN usability issue might be enough of a
> reason on its own, though.
> 

OK

>> Anyway, this patch is mostly about "Index Cond" mixing two types of
>> predicates. But the patch is really about "Filter" predicates - moving
>> some of them from table to index. So quite similar to the "index filter
>> predicates" except that those are handled by the index AM.
> 
> I understand that that's how the patch is structured. It is
> nevertheless possible (as things stand) that the patch will make the
> planner shift from a plan that uses "Access Predicates" to the maximum
> extent possible when scanning a composite index, to a similar plan
> that has a similar index scan, for the same index, but with fewer
> "Access Predicates" in total. In effect, the patched planner will swap
> one type of predicate for another type because doing so enables the
> executor to scan an index once instead of scanning it several times.
> 

That seems very much like something the costing is meant to handle, no?
I mean, surely "access predicate" and "filter" should affect the cost
differently, with "filter" being more expensive (and table filter being
more expensive than index filter).

I was not sure why would the patch affect the plan choice, as it does
not really affect the index predicates (passed to the AM) at all. But I
think I get the point now - the thing is in having two different index
paths, like:

  PATH #1: access predicates (A,B)
  PATH #2: access predicate A, index filter B

AFAICS the assumption is that path #1 should be better, as it has two
proper access predicates. But maybe if we add another condition C, it
might end up like this:

  PATH #1: access predicates (A,B), table filter C
  PATH #2: access predicate A, index filter (B,C)

and #2 will end up winning.

I still think this seems more like a costing issue (and I'd guess we may
already have similar cases for index-only scans).

IMO we can either consider the different predicate types during costing.
Sure, then we have the usual costing risks, but that's expected.

Or we could just ignore this during costing entirely (and ditch the
costing from the patch). Then the cost doesn't change, and we don't have
any new risks.

> I don't dispute the fact that this can only happen when the planner
> believes (with good reason) that the expected cost will be lower. But
> I maintain that there is a novel risk to be concerned about, which is
> meaningfully distinct from the general risk of regressions that comes
> from making just about any change to the planner. The important
> principle here is that we should have a strong bias in the direction
> of making quals into true "Access Predicates" whenever practical.
> 
> Yeah, technically the patch doesn't directly disrupt how existing
> index paths get generated. But it does have the potential to disrupt
> it indirectly, by providing an alternative very-similar index path
> that's likely to outcompete the existing one in these cases. I think
> that we should have just one index path that does everything well
> instead.
> 

Yeah, I think that's the scenario I described above ...

>> But differentiating between access and filter predicates (at the index
>> AM level) seems rather independent of what this patch aims to do.
> 
> My concern is directly related to the question of "access predicates
> versus filter predicates", and the planner's current ignorance on
> which is which. The difference may not matter too much right now, but
> ISTM that your patch makes it matter a lot more. And so in that
> indirect sense it does seem relevant.
> 

I'm not sure my patch makes it matter a lot more. Yes, moving a filter
from the table to the index may lower the scan cost, but that can happen
for a lot of other reasons ...

> The planner has always had a strong bias in the direction of making
> clauses that can be index quals into index quals, rather than filter
> predicates. It makes sense to do that even when they aren't very
> selective. This is a similar though distinct principle.
> 
> It's awkward to discuss the issue, since we don't really have official
> names for these things just yet (I'm just going with what Markus calls
> them for now). And because there is more than one type of "index
> filter predicate" in play with the patch (namely those in the index
> AM, and those in the index scan executor node). But my concern boils
> down to access predicates being strictly better than equivalent index
> filter predicates.
> 

I like the discussion, but it feels a bit abstract (and distant from
what the patch aimed to do) and I have trouble turning it into something
actionable.

>> I'm not following. Why couldn't there be some second-order effects?
>> Maybe it's obvious / implied by something said earlier, but I think
>> every time we decide between multiple choices, there's a risk of picking
>> wrong.
> 
> Naturally, I agree that some amount of risk is inherent. I believe
> that the risk I have identified is qualitatively different to the
> standard, inherent kind of risk.
> 
>> Anyway, is this still about this patch or rather about your SAOP patch?
> 
> It's mostly not about my SAOP patch. It's just that my SAOP patch will
> do exactly the right thing in this particular case (at least once
> combined with Alena Rybakina's OR-to-SAOP transformation patch). It is
> therefore quite clear that a better plan is possible in principle.
> Clearly we *can* have the benefit of only one single index scan (i.e.
> no BitmapOr node combining multiple similar index scans), without
> accepting any novel new risk to get that benefit. We should just have
> one index path that does it all, while avoiding duplicative index
> paths that embody a distinction that really shouldn't exist -- it
> should be dealt with at runtime instead.
> 

Does this apply to the index scan vs. index-only scans too? That is, do
you think we should we have just one index-scan path, doing index-only
stuff when possible?

>> True. IMHO the danger or "index filter" predicates is that people assume
>> index conditions eliminate large parts of the index - which is not
>> necessarily the case here. If we can improve this, cool.
> 
> I think that we'd find a way to use the same information in the
> planner, too. It's not just users that should care about the
> difference. And I don't think that it's particularly likely to be
> limited to SAOP/MDAM stuff. As Markus said, we're talking about an
> important general principle here.
> 

If we want/need to consider this during costing, this seems necessary.

>> But again, this is not what this patch does, right? It's about moving
>> stuff from "table filter" to "index filter". And those clauses were not
>> matched to the index AM at all, so it's not really relevant to the
>> discussion about different subtypes of predicates.
> 
> I understand that what I've said is not particularly helpful. At least
> not on its own. You quite naturally won't want to tie the fate of this
> patch to my SAOP patch, which is significantly more complicated. I do
> think that my concern about this being a novel risk needs to be
> carefully considered.
> 
> Maybe it's possible to address my concern outside of the context of my
> own SAOP patch. That would definitely be preferable all around. But
> "access predicates versus filter predicates" seems important and
> relevant, either way.
> 

If we can form some sort of plan what needs to be done (both for my
patch and for the SAOP patch), I'm willing to work on it ... But it's
not quite clear to me what the requirements are.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Thu, Aug 3, 2023 at 4:20 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Which is just the 7 buffers ...
>
> Did I do something wrong?

I think that it might have something to do with your autovacuum
settings. Note that the plan that you've shown for the master branch
isn't the same one that appears in
src/test/regress/expected/create_index.out for the master branch --
that plan (the BitmapOr plan) was my baseline case for master.

That said, I am a little surprised that you could ever get the plan
that you showed for master (without somehow unnaturally forcing it).
It's almost the same plan that your patch gets, but much worse. Your
patch can use an index filter, but master uses a table filter instead.

While the plan used by the patch is risky in the way that I described,
the plan you saw for master is just horrible. I mean, it's not even
risky -- it seems almost certain to lose. Whereas at least the plan
from the patch really is cheaper than the BitmapOr plan (the master
branch plan from create_index.out) on average.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/3/23 18:47, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 4:20 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Which is just the 7 buffers ...
>>
>> Did I do something wrong?
> 
> I think that it might have something to do with your autovacuum
> settings. Note that the plan that you've shown for the master branch
> isn't the same one that appears in
> src/test/regress/expected/create_index.out for the master branch --
> that plan (the BitmapOr plan) was my baseline case for master.
> 
> That said, I am a little surprised that you could ever get the plan
> that you showed for master (without somehow unnaturally forcing it).
> It's almost the same plan that your patch gets, but much worse. Your
> patch can use an index filter, but master uses a table filter instead.
> 

Well I did force it - I thought we're talking about regular index scans,
so I disabled bitmap scans. Without doing that I get the BitmapOr plan
like you.

However, with the patch I get this behavior (starting from a "fresh"
state right after "make installcheck")

                                QUERY PLAN
------------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on tenk1
       (cost=0.29..8.38 rows=1 width=244)
       (actual time=0.033..0.036 rows=1 loops=1)
   Index Cond: (thousand = 42)
   Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Rows Removed by Index Recheck: 9
   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Buffers: shared read=4
 Planning:
   Buffers: shared hit=119 read=32
 Planning Time: 0.673 ms
 Execution Time: 0.116 ms
(10 rows)

insert into tenk1 (thousand, tenthous) select 42, i from
generate_series(43, 1000) i;

                                QUERY PLAN
------------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on tenk1
       (cost=0.29..8.38 rows=1 width=244)
       (actual time=0.038..0.605 rows=1 loops=1)
   Index Cond: (thousand = 42)
   Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Rows Removed by Index Recheck: 967
   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Buffers: shared hit=336
 Planning Time: 0.114 ms
 Execution Time: 0.632 ms
(8 rows)

analyze tenk1;

                                QUERY PLAN
------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=12.89..16.91 rows=1 width=244)
                            (actual time=0.016..0.019 rows=1 loops=1)
   Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR
                  ((thousand = 42) AND (tenthous = 3)) OR
                  ((thousand = 42) AND (tenthous = 42)))
   Heap Blocks: exact=1
   Buffers: shared hit=7
   ->  BitmapOr  ...
         Buffers: shared hit=6
         ->  Bitmap Index Scan on tenk1_thous_tenthous ...
               Index Cond: ((thousand = 42) AND (tenthous = 1))
               Buffers: shared hit=2
         ->  Bitmap Index Scan on tenk1_thous_tenthous ...
               Index Cond: ((thousand = 42) AND (tenthous = 3))
               Buffers: shared hit=2
         ->  Bitmap Index Scan on tenk1_thous_tenthous ...
               Index Cond: ((thousand = 42) AND (tenthous = 42))
               Buffers: shared hit=2
 Planning Time: 0.344 ms
 Execution Time: 0.044 ms
(19 rows)

vacuum analyze tenk1;

                                QUERY PLAN
------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=12.89..16.91 rows=1 width=244)
                            (actual time=0.017..0.019 rows=1 loops=1)
   Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR
                  ((thousand = 42) AND (tenthous = 3)) OR
                  ((thousand = 42) AND (tenthous = 42)))
   Heap Blocks: exact=1
   Buffers: shared hit=7
   ->  BitmapOr  ...
         Buffers: shared hit=6
         ->  Bitmap Index Scan on tenk1_thous_tenthous  ...
               Index Cond: ((thousand = 42) AND (tenthous = 1))
               Buffers: shared hit=2
         ->  Bitmap Index Scan on tenk1_thous_tenthous  ...
               Index Cond: ((thousand = 42) AND (tenthous = 3))
               Buffers: shared hit=2
         ->  Bitmap Index Scan on tenk1_thous_tenthous  ...
               Index Cond: ((thousand = 42) AND (tenthous = 42))
               Buffers: shared hit=2
 Planning Time: 0.277 ms
 Execution Time: 0.046 ms
(19 rows)

set enable_bitmapscan = off;

                                QUERY PLAN
------------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on tenk1
     (cost=0.29..23.57 rows=1 width=244)
     (actual time=0.042..0.235 rows=1 loops=1)
   Index Cond: (thousand = 42)
   Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Rows Removed by Index Recheck: 967
   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Buffers: shared hit=7
 Planning Time: 0.119 ms
 Execution Time: 0.261 ms
(8 rows)

So yeah, it gets

   Buffers: shared hit=336

right after the insert, but it seems to be mostly about visibility map
(and having to fetch heap tuples), as it disappears after vacuum.

There seems to be some increase in cost, so we switch back to the bitmap
plan. I haven't looked into that, but I guess there's either some thinko
in the costing change, or maybe it's due to correlation.

> While the plan used by the patch is risky in the way that I described,
> the plan you saw for master is just horrible. I mean, it's not even
> risky -- it seems almost certain to lose. Whereas at least the plan
> from the patch really is cheaper than the BitmapOr plan (the master
> branch plan from create_index.out) on average.
> 

Not sure. I'm a bit confused about what exactly is so risky on the plan
produced with the patch.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Thu, Aug 3, 2023 at 4:57 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> > I understand that that's how the patch is structured. It is
> > nevertheless possible (as things stand) that the patch will make the
> > planner shift from a plan that uses "Access Predicates" to the maximum
> > extent possible when scanning a composite index, to a similar plan
> > that has a similar index scan, for the same index, but with fewer
> > "Access Predicates" in total. In effect, the patched planner will swap
> > one type of predicate for another type because doing so enables the
> > executor to scan an index once instead of scanning it several times.
> >
>
> That seems very much like something the costing is meant to handle, no?
> I mean, surely "access predicate" and "filter" should affect the cost
> differently, with "filter" being more expensive (and table filter being
> more expensive than index filter).

I'm not 100% sure that it's not a costing issue, but intuitively it
doesn't seem like one.

As Goetz Graefe once said, "choice is confusion". It seems desirable
to have fewer, better index paths. This is possible whenever there is
a way to avoid the index paths that couldn't possibly be better in the
first place. Though we must also make sure that there is no real
downside -- possibly by teaching the executor to behave adaptively
instead of needlessly trusting what the planner says. Turning a plan
time decision into a runtime decision seems strictly better.

Obviously the planner will always need to be trusted to a significant
degree (especially for basic things like join order), but why not
avoid it when we can avoid it without any real downsides? Having lots
of slightly different index paths with slightly different types of
logically equivalent predicates seems highly undesirable, and quite
avoidable.

ISTM that it should be possible to avoid generating some of these
index paths based on static rules that assume that:

1. An "access predicate" is always strictly better than an equivalent
"index filter predicate" (for any definition of "index filter
predicate" you can think of).
2. An "Index Filter: " is always strictly better than an equivalent
"Filter: " (i.e. table filter).

The first item is what I've been going on about, of course. The second
item is the important principle behind your patch -- and one that I
also agree with. I don't see any contradictions here -- these two
principles are compatible. I think that we can have it both ways.

> AFAICS the assumption is that path #1 should be better, as it has two
> proper access predicates. But maybe if we add another condition C, it
> might end up like this:
>
>   PATH #1: access predicates (A,B), table filter C
>   PATH #2: access predicate A, index filter (B,C)
>
> and #2 will end up winning.

Why wouldn't we expect there to also be this path:

PATH #3: access predicates (A,B), index filter C

And why wouldn't we also expect this other path to always be better?
So much better that we don't even need to bother generating PATH #1
and PATH #2 in the first place, even?

Right now there are weird reasons why it might not be so -- strange
interactions with things like BitmapOr nodes that could make either
PATH #1 or PATH #2 look slightly cheaper. But that doesn't seem
particularly fundamental to me. We should probably just avoid those
plan shapes that have the potential to make PATH #1 and PATH #2
slightly cheaper, due only to perverse interactions.

> I like the discussion, but it feels a bit abstract (and distant from
> what the patch aimed to do) and I have trouble turning it into something
> actionable.

I think that I have gotten a lot out of this discussion -- it has made
my thinking about this stuff more rigorous. I really appreciate that.

> Does this apply to the index scan vs. index-only scans too? That is, do
> you think we should we have just one index-scan path, doing index-only
> stuff when possible?

I think so, yes. But index-only scans don't appear under BitmapOr
nodes, so right now I can't think of an obvious way of demonstrating
that this is true. Maybe it accidentally doesn't come up with
index-only scans in practice, but the same underlying principles
should be just as true.

> If we can form some sort of plan what needs to be done (both for my
> patch and for the SAOP patch), I'm willing to work on it ... But it's
> not quite clear to me what the requirements are.

I do hope to have more concrete proposals soon. Thanks for being patient.

For what it's worth, I actually think that there is a good chance that
I'll end up relying on what you've done here to make certain things I
want to do with the SOAP patch okay. It would be rather convenient to
be able to handle some of the SAOP safety issues without needing any
table filters (just index filters), in some corner cases. I think that
what you're doing here makes a lot of sense. FWIW, I am already
personally invested in the success of your patch.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Thu, Aug 3, 2023 at 11:17 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Not sure. I'm a bit confused about what exactly is so risky on the plan
> produced with the patch.

It's all about the worst case. In the scenarios that I'm concerned
about, we can be quite sure that the saving from not using a BitmapOr
will be fairly low -- the cost of not having to repeat the same index
page accesses across several similar index scans is, at best, some
small multiple of the would-be number of index scans that the BitmapOr
plan gets. We can be certain that the possible benefits are fixed and
low. This is always true; presumably the would-be BitmapOr plan can
never have all that many index scans. And we always know how many
index scans a BitmapOr plan would use up front.

On the other hand, the possible downsides have no obvious limit. So
even if we're almost certain to win on average, we only have to be
unlucky once to lose all we gained before that point. As a general
rule, we want the index AM to have all the context required to
terminate its scan at the earliest possible opportunity. This is
enormously important in the worst case.

It's easier for me to make this argument because I know that we don't
really need to make any trade-off here. But even if that wasn't the
case, I'd probably arrive at the same general conclusion.

Importantly, it isn't possible to make a similar argument that works
in the opposite direction -- IMV that's the difference between this
flavor of riskiness, and the inevitable riskiness that comes with any
planner change. In other words, your patch isn't going to win by an
unpredictably high amount. Not in the specific scenarios that I'm
focussed on here, with a BitmapOr + multiple index scans getting
displaced.

The certainty about the upside is just as important as the uncertainty
about the downside. The huge asymmetry matters, and is fairly
atypical. If, somehow, there was less certainty about the possible
upside, then my argument wouldn't really work.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/3/23 20:50, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 4:57 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>> I understand that that's how the patch is structured. It is
>>> nevertheless possible (as things stand) that the patch will make the
>>> planner shift from a plan that uses "Access Predicates" to the maximum
>>> extent possible when scanning a composite index, to a similar plan
>>> that has a similar index scan, for the same index, but with fewer
>>> "Access Predicates" in total. In effect, the patched planner will swap
>>> one type of predicate for another type because doing so enables the
>>> executor to scan an index once instead of scanning it several times.
>>>
>>
>> That seems very much like something the costing is meant to handle, no?
>> I mean, surely "access predicate" and "filter" should affect the cost
>> differently, with "filter" being more expensive (and table filter being
>> more expensive than index filter).
> 
> I'm not 100% sure that it's not a costing issue, but intuitively it
> doesn't seem like one.
> 
> As Goetz Graefe once said, "choice is confusion". It seems desirable
> to have fewer, better index paths. This is possible whenever there is
> a way to avoid the index paths that couldn't possibly be better in the
> first place. Though we must also make sure that there is no real
> downside -- possibly by teaching the executor to behave adaptively
> instead of needlessly trusting what the planner says. Turning a plan
> time decision into a runtime decision seems strictly better.
> 

Sure, having more choices means a risk of making mistakes. But does
simply postponing the choices to runtime actually solves this? Even if
you're able to do a perfect choice at that point, it's only works for
that particular path type (e.g. index scan). You still need to cost it
somehow, to decide which path type to pick ...

Perhaps my mental model of what you intend to do is wrong?

> Obviously the planner will always need to be trusted to a significant
> degree (especially for basic things like join order), but why not
> avoid it when we can avoid it without any real downsides? Having lots
> of slightly different index paths with slightly different types of
> logically equivalent predicates seems highly undesirable, and quite
> avoidable.
> 
> ISTM that it should be possible to avoid generating some of these
> index paths based on static rules that assume that:
> 
> 1. An "access predicate" is always strictly better than an equivalent
> "index filter predicate" (for any definition of "index filter
> predicate" you can think of).

Yes, probably.

> 2. An "Index Filter: " is always strictly better than an equivalent
> "Filter: " (i.e. table filter).

Not sure about this. As I explained earlier, I think it needs to
consider the cost/selectivity of the predicate, and fraction of
allvisible pages. But yes, it's a static decision.

> 
> The first item is what I've been going on about, of course. The second
> item is the important principle behind your patch -- and one that I
> also agree with. I don't see any contradictions here -- these two
> principles are compatible. I think that we can have it both ways.
> 
>> AFAICS the assumption is that path #1 should be better, as it has two
>> proper access predicates. But maybe if we add another condition C, it
>> might end up like this:
>>
>>   PATH #1: access predicates (A,B), table filter C
>>   PATH #2: access predicate A, index filter (B,C)
>>
>> and #2 will end up winning.
> 
> Why wouldn't we expect there to also be this path:
> 
> PATH #3: access predicates (A,B), index filter C
> 

Why? Maybe the index doesn't have all the columns needed for condition
C, in which case it has to be evaluated as a table filter.

(I didn't say it explicitly, but this assumes those paths are not for
the same index. If they were, then PATH #3 would have to exist too.)

> And why wouldn't we also expect this other path to always be better?
> So much better that we don't even need to bother generating PATH #1
> and PATH #2 in the first place, even?
> 

Yes, I agree that path would likely be superior to the other paths.

> Right now there are weird reasons why it might not be so -- strange
> interactions with things like BitmapOr nodes that could make either
> PATH #1 or PATH #2 look slightly cheaper. But that doesn't seem
> particularly fundamental to me. We should probably just avoid those
> plan shapes that have the potential to make PATH #1 and PATH #2
> slightly cheaper, due only to perverse interactions.
> 
>> I like the discussion, but it feels a bit abstract (and distant from
>> what the patch aimed to do) and I have trouble turning it into something
>> actionable.
> 
> I think that I have gotten a lot out of this discussion -- it has made
> my thinking about this stuff more rigorous. I really appreciate that.
> 

I feel a bit like the rubber duck from [1], but I'm OK with that ;-)

>> Does this apply to the index scan vs. index-only scans too? That is, do
>> you think we should we have just one index-scan path, doing index-only
>> stuff when possible?
> 
> I think so, yes. But index-only scans don't appear under BitmapOr
> nodes, so right now I can't think of an obvious way of demonstrating
> that this is true. Maybe it accidentally doesn't come up with
> index-only scans in practice, but the same underlying principles
> should be just as true.
> 
>> If we can form some sort of plan what needs to be done (both for my
>> patch and for the SAOP patch), I'm willing to work on it ... But it's
>> not quite clear to me what the requirements are.
> 
> I do hope to have more concrete proposals soon. Thanks for being patient.
> 
> For what it's worth, I actually think that there is a good chance that
> I'll end up relying on what you've done here to make certain things I
> want to do with the SOAP patch okay. It would be rather convenient to
> be able to handle some of the SAOP safety issues without needing any
> table filters (just index filters), in some corner cases. I think that
> what you're doing here makes a lot of sense. FWIW, I am already
> personally invested in the success of your patch.
> 

Great! I'm happy those bits are likely useful for what you're doing.

regards


[1] https://en.wikipedia.org/wiki/Rubber_duck_debugging

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/3/23 21:21, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 11:17 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Not sure. I'm a bit confused about what exactly is so risky on the plan
>> produced with the patch.
> 
> It's all about the worst case. In the scenarios that I'm concerned
> about, we can be quite sure that the saving from not using a BitmapOr
> will be fairly low -- the cost of not having to repeat the same index
> page accesses across several similar index scans is, at best, some
> small multiple of the would-be number of index scans that the BitmapOr
> plan gets. We can be certain that the possible benefits are fixed and
> low. This is always true; presumably the would-be BitmapOr plan can
> never have all that many index scans. And we always know how many
> index scans a BitmapOr plan would use up front.
> 

When you say "index page accesses" do you mean accesses to index pages,
or accesses to heap pages from the index scan?

Because my patch is all about reducing the heap pages, which are usually
the expensive part of the index scan. But you're right the "index scan"
with index filter may access more index pages, because it has fewer
"access predicates".

I don't quite see that with the tenk1 query we've been discussing (the
extra buffers were due to non-allvisible heap pages), but I guess that's
possible.

> On the other hand, the possible downsides have no obvious limit. So
> even if we're almost certain to win on average, we only have to be
> unlucky once to lose all we gained before that point. As a general
> rule, we want the index AM to have all the context required to
> terminate its scan at the earliest possible opportunity. This is
> enormously important in the worst case.
> 

Yeah, I agree there's some asymmetry in the risk/benefit. It's not
unlike e.g. seqscan vs. index scan, where the index scan can't really
save more than what the seqscan costs, but it can get (almost)
arbitrarily expensive.

> It's easier for me to make this argument because I know that we don't
> really need to make any trade-off here. But even if that wasn't the
> case, I'd probably arrive at the same general conclusion.
> 
> Importantly, it isn't possible to make a similar argument that works
> in the opposite direction -- IMV that's the difference between this
> flavor of riskiness, and the inevitable riskiness that comes with any
> planner change. In other words, your patch isn't going to win by an
> unpredictably high amount. Not in the specific scenarios that I'm
> focussed on here, with a BitmapOr + multiple index scans getting
> displaced.
> 

True. It probably can't beat BitmapOr plan if it means moving access
predicate into index filter (or even worse a table filter).

> The certainty about the upside is just as important as the uncertainty
> about the downside. The huge asymmetry matters, and is fairly
> atypical. If, somehow, there was less certainty about the possible
> upside, then my argument wouldn't really work.
> 


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Thu, Aug 3, 2023 at 2:46 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Sure, having more choices means a risk of making mistakes. But does
> simply postponing the choices to runtime actually solves this?

It solves that one problem, yes. This is particularly important in
cases where we would otherwise get truly pathological performance --
not just mediocre or bad performance. Most of the time, mediocre
performance isn't such a big deal.

Having a uniform execution strategy for certain kinds of index scans
is literally guaranteed to beat a static strategy in some cases. For
example, with some SAOP scans (with my patch), we'll have to skip lots
of the index, and then scan lots of the index -- just because of a
bunch of idiosyncratic details that are almost impossible to predict
using statistics. Such an index scan really shouldn't be considered
"moderately skippy". It is not the average of two opposite things --
it is more like two separate things that are opposites.

It's true that even this "moderately skippy" case needs to be costed.
But if we can entirely eliminate variation that really looks like
noise, it should be more likely that we'll get the cheapest plan.
Costing may not be any easier, but getting the cheapest plan might be.

> > 1. An "access predicate" is always strictly better than an equivalent
> > "index filter predicate" (for any definition of "index filter
> > predicate" you can think of).
>
> Yes, probably.
>
> > 2. An "Index Filter: " is always strictly better than an equivalent
> > "Filter: " (i.e. table filter).
>
> Not sure about this. As I explained earlier, I think it needs to
> consider the cost/selectivity of the predicate, and fraction of
> allvisible pages. But yes, it's a static decision.

What I said is limited to "equivalent" predicates. If it's just not
possible to get an "access predicate" at all, then my point 1 doesn't
apply. Similarly, if it just isn't possible to get an "Index Filter"
(only a table filter), then my point #2 doesn't apply.

This does mean that there could still be competition between multiple
index paths for the same composite index, but I have no objections to
that -- it makes sense to me because it isn't duplicative in the way
that I'm concerned about. It just isn't possible to delay anything
until run time in this scenario, so nothing that I've said should
apply.

> (I didn't say it explicitly, but this assumes those paths are not for
> the same index. If they were, then PATH #3 would have to exist too.)

That changes everything, then. If they're completely different indexes
then nothing I've said should apply. I can't think of a way to avoid
making an up-front commitment to that in the planner (I'm thinking of
far more basic things than that).

> I feel a bit like the rubber duck from [1], but I'm OK with that ;-)

Not from my point of view. Besides, even when somebody says that they
just don't understand what I'm saying at all (which wasn't ever fully
the case here), that is often useful feedback in itself.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> When you say "index page accesses" do you mean accesses to index pages,
> or accesses to heap pages from the index scan?

Yes, that's exactly what I mean. Note that that's the dominant cost
for the original BitmapOr plan.

As I said upthread, the original BitmapOr plan has 7 buffer hits. The
breakdown is 1 single heap page access, 3 root page accesses, and 3
leaf page accesses. There is only 1 heap page access because only 1
out of the 3 index scans that feed into the BitmapOr actually end up
finding any matching rows in the index.

In short, the dominant cost here is index page accesses. It's a
particularly good case for my SAOP patch!

> Because my patch is all about reducing the heap pages, which are usually
> the expensive part of the index scan. But you're right the "index scan"
> with index filter may access more index pages, because it has fewer
> "access predicates".

The fact is that your patch correctly picks the cheapest plan, which
is kinda like a risky version of the plan that my SAOP patch would
pick -- it is cheaper for the very same reason. I understand that
that's not your intention at all, but this is clearly what happened.
That's what I meant by "weird second order effects".

To me, it really does kinda look like your patch accidentally
discovered a plan that's fairly similar to the plan that my SAOP patch
would have found by design! Perhaps I should have been clearer on this
point earlier. (If you're only now seeing this for yourself for the
first time, then...oops. No wonder you were confused about which
patch it was I was going on about!)

> I don't quite see that with the tenk1 query we've been discussing (the
> extra buffers were due to non-allvisible heap pages), but I guess that's
> possible.

The extra buffer hits occur because I made them occur by inserting new
tuples where thousand = 42. Obviously, I did it that way because I had
a point that I wanted to make. Obviously, there wouldn't have been any
notable regression from your patch at all if I had (say) inserted
tuples where thousand = 43 instead. (Not for the original "42" query,
at least.)

That's part of the problem, as I see it. Local changes like that can
have outsized impact on individual queries, even though there is no
inherent reason to expect it. How can statistics reliably guide the
planner here? Statistics are supposed to be a summary of the whole
attribute, that allow us to make various generalizations during
planning. But this plan leaves us sensitive to relatively small
changes in one particular "thousand" grouping, with potentially
outsized impact. And, this can happen very suddenly, because it's so
"local".

Making this plan perform robustly just doesn't seem to be one of the
things that statistics can be expected to help us with very much.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Because my patch is all about reducing the heap pages, which are usually
> the expensive part of the index scan. But you're right the "index scan"
> with index filter may access more index pages, because it has fewer
> "access predicates".

It's not so much the unnecessary index page accesses that bother me.
At least I didn't push that aspect very far when I constructed my
adversarial test case -- index pages were only a small part of the
overall problem. (I mean the problem at runtime, in the executor. The
planner expected to save a small number of leaf page accesses, which
was kinda, sorta the problem there -- though the planner might have
technically still been correct about that, and can't have been too far
wrong in any case.)

The real problem that my adversarial case seemed to highlight was a
problem of extra heap page accesses. The tenk1 table's VM is less than
one page in size, so how could it have been VM buffer hits? Sure,
there were no "table filters" involved -- only "index filters". But
even "index filters" require heap access when the page isn't marked
all-visible in the VM.

That problem just cannot happen with a similar plan that eliminates
the same index tuples within the index AM proper (the index quals
don't even have to be "access predicates" for this to apply, either).
Of course, we never need to check the visibility of index tuples just
to be able to consider eliminating them via nbtree search scan
keys/index quals -- and so there is never any question of heap/VM
access for tuples that don't pass index quals. Not so for "index
filters", where there is at least some chance of accessing the heap
proper just to be able to eliminate non-matches.

While I think that it makes sense to assume that "index filters" are
strictly better than "table filters" (assuming they're directly
equivalent in that they contain the same clause), they're not
*reliably* any better. So "index filters" are far from being anywhere
near as good as an equivalent index qual (AFAICT we should always
assume that that's true). This is true of index quals generally --
this advantage is *not* limited to "access predicate index quals". (It
is most definitely not okay for "index filters" to displace equivalent
"access predicate index quals", but it's also not really okay to allow
them to displace equivalent "index filter predicate index quals" --
the latter case is less bad, but AFAICT they both basically aren't
acceptable "substitutions".)

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/4/23 02:08, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> When you say "index page accesses" do you mean accesses to index pages,
>> or accesses to heap pages from the index scan?
> 
> Yes, that's exactly what I mean. Note that that's the dominant cost
> for the original BitmapOr plan.
> 

Well, I presented multiple options, so "yes" doesn't really clarify
which of them applies. But my understanding is you meant the index pages
accesses.

> As I said upthread, the original BitmapOr plan has 7 buffer hits. The
> breakdown is 1 single heap page access, 3 root page accesses, and 3
> leaf page accesses. There is only 1 heap page access because only 1
> out of the 3 index scans that feed into the BitmapOr actually end up
> finding any matching rows in the index.
> 
> In short, the dominant cost here is index page accesses. It's a
> particularly good case for my SAOP patch!
> 

Understood. It makes sense considering the SAOP patch is all about
optimizing the index walk / processing fewer pages.

>> Because my patch is all about reducing the heap pages, which are usually
>> the expensive part of the index scan. But you're right the "index scan"
>> with index filter may access more index pages, because it has fewer
>> "access predicates".
> 
> The fact is that your patch correctly picks the cheapest plan, which
> is kinda like a risky version of the plan that my SAOP patch would
> pick -- it is cheaper for the very same reason. I understand that
> that's not your intention at all, but this is clearly what happened.
> That's what I meant by "weird second order effects".
> 
> To me, it really does kinda look like your patch accidentally
> discovered a plan that's fairly similar to the plan that my SAOP patch
> would have found by design! Perhaps I should have been clearer on this
> point earlier. (If you're only now seeing this for yourself for the
> first time, then...oops. No wonder you were confused about which
> patch it was I was going on about!)
> 

Thanks. I think I now see the relationship between the plan with my
patch and your SAOP patch. It's effectively very similar, except that
the responsibilities are split a bit differently. With my patch the OR
clause happens outside AM, while the SAOP patch would do that in the AM
and also use that to walk the index more efficiently.

>> I don't quite see that with the tenk1 query we've been discussing (the
>> extra buffers were due to non-allvisible heap pages), but I guess that's
>> possible.
> 
> The extra buffer hits occur because I made them occur by inserting new
> tuples where thousand = 42. Obviously, I did it that way because I had
> a point that I wanted to make. Obviously, there wouldn't have been any
> notable regression from your patch at all if I had (say) inserted
> tuples where thousand = 43 instead. (Not for the original "42" query,
> at least.)
> 

Right. I know where the heap accesses come from, but clearly we're able
to skip those when allvisible=true, and we don't need to scan more index
pages either. But I guess we could make the data more complex to make
this part worse (for my patch, while the SAOP would not be affected).

> That's part of the problem, as I see it. Local changes like that can
> have outsized impact on individual queries, even though there is no
> inherent reason to expect it. How can statistics reliably guide the
> planner here? Statistics are supposed to be a summary of the whole
> attribute, that allow us to make various generalizations during
> planning. But this plan leaves us sensitive to relatively small
> changes in one particular "thousand" grouping, with potentially
> outsized impact. And, this can happen very suddenly, because it's so
> "local".
> 
> Making this plan perform robustly just doesn't seem to be one of the
> things that statistics can be expected to help us with very much.
> 

I agree there certainly are cases where the estimates will be off. This
is not that different from correlated columns, in fact it's exactly the
same issue, I think. But it's also not a novel/unique issue - we should
probably do the "usual" thing, i.e. plan based on estimates (maybe with
some safety margin), and have some fallback strategy at runtime.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Fri, Aug 4, 2023 at 4:47 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Well, I presented multiple options, so "yes" doesn't really clarify
> which of them applies. But my understanding is you meant the index pages
> accesses.

Sorry. Your understanding of what I must have meant before was correct
-- your patch picked that plan because it reduced the number of index
page accesses significantly. Just like my SAOP patch would have.

> > In short, the dominant cost here is index page accesses. It's a
> > particularly good case for my SAOP patch!
> >
>
> Understood. It makes sense considering the SAOP patch is all about
> optimizing the index walk / processing fewer pages.

Actually, some of the most compelling cases for my SAOP patch are
those involving heap page access savings, which come from the planner
changes. Basically, the nbtree/executor changes make certain index
access patterns much more efficient. Which is useful in itself, but
often much more useful as an indirect enabler of avoiding heap page
accesses by altering other related aspects of a plan in the planner.
Sometimes by replacing table filter quals with index quals (the nbtree
changes make nbtree a strictly better place for the quals). You know,
somewhat like your patch.

That's really why I'm so interested in your patch, and its
relationship with my own patch, and the BitmapOr issue. If your patch
enables some tricks that are really quite similar to the tricks that
my own patch enables, then delineating which patch does which exact
trick when is surely important for both patches.

I actually started out just thinking about index page accesses, before
eventually coming to understand that heap page accesses were also very
relevant. Whereas you started out just thinking about heap page
accesses, and now see some impact from saving on index page accesses.

> Thanks. I think I now see the relationship between the plan with my
> patch and your SAOP patch. It's effectively very similar, except that
> the responsibilities are split a bit differently. With my patch the OR
> clause happens outside AM, while the SAOP patch would do that in the AM
> and also use that to walk the index more efficiently.

Right. That's where my idea of structuring things so that there is
only one best choice really comes from.

> I agree there certainly are cases where the estimates will be off. This
> is not that different from correlated columns, in fact it's exactly the
> same issue, I think.

In one way I think you're right that it's the same issue -- if you
just focus on that one executor node, then it's the same issue. But I
don't think it's the same issue in a deeper sense, since this is one
case where you simply don't have to accept any risk. We really should
be able to just not ever do this, for a limited though important
subset of cases involving ORs + indexable operators.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/4/23 04:07, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Because my patch is all about reducing the heap pages, which are usually
>> the expensive part of the index scan. But you're right the "index scan"
>> with index filter may access more index pages, because it has fewer
>> "access predicates".
> 
> It's not so much the unnecessary index page accesses that bother me.
> At least I didn't push that aspect very far when I constructed my
> adversarial test case -- index pages were only a small part of the
> overall problem. (I mean the problem at runtime, in the executor. The
> planner expected to save a small number of leaf page accesses, which
> was kinda, sorta the problem there -- though the planner might have
> technically still been correct about that, and can't have been too far
> wrong in any case.)
> 

Thanks for the clarification, I think I understand better now.

Let me briefly sum my understanding for the two patches:

- The SAOP patch eliminates those heap accesses because it manages to
evaluate all clauses in the AM, including clauses that would previously
be treated as "table filters" and evaluated on the heap row.

- My patch achieves a similar result by evaluating the clauses as index
filters (i.e. on the index tuple). That's probably not as good as proper
access predicates, so it can't help with the index page accesses, but
better than what we had before.

There's a couple more related thoughts later in my reply.

> The real problem that my adversarial case seemed to highlight was a
> problem of extra heap page accesses. The tenk1 table's VM is less than
> one page in size, so how could it have been VM buffer hits? Sure,
> there were no "table filters" involved -- only "index filters". But
> even "index filters" require heap access when the page isn't marked
> all-visible in the VM.
> 

No, the extra accesses were not because of VM buffer hits - it was
because of having to actually fetch the heap tuple for pages that are
not fully visible, which is what happens right after the insert.

The patch does what we index-only scans do - before evaluating the
filters on an index tuple, we check if the page is fully visible. If
not, we fetch the heap tuple and evaluate the filters on it.

This means even an index-only scan would behave like this too. And it
goes away as the table gets vacuumed, at which point we can eliminate
the rows using only the index tuple again.

> That problem just cannot happen with a similar plan that eliminates
> the same index tuples within the index AM proper (the index quals
> don't even have to be "access predicates" for this to apply, either).
> Of course, we never need to check the visibility of index tuples just
> to be able to consider eliminating them via nbtree search scan
> keys/index quals -- and so there is never any question of heap/VM
> access for tuples that don't pass index quals. Not so for "index
> filters", where there is at least some chance of accessing the heap
> proper just to be able to eliminate non-matches.
> 

Right. This however begs a question - why would we actually need to
check the visibility map before evaluating the index filter, when the
index tuple alone is clearly good enough for the bitmapOr plan?

Because if we didn't need to do that VM check, this issue with extra
heap accesses would disappear.

I copied this from the IOS somewhat blindly, but now I'm starting to
think it was misguided. I thought it's a protection against processing
"invalid" tuples - not tuples broken after a crash (as that would be
fixed by crash recovery), but e.g. tuples with schema changes from an
aborted transaction.

But can this actually happen for indexes? For heap it's certainly
possible (BEGIN - ALTER - INSERT - ROLLBACK will leave behind tuples
like that). But we don't support changing indexes like this, right?

And if we had this issue, how come the bitmapOr plan (which ultimately
does the same thing, although entirely in the AM) does not need to do
these VM checks / heap accesses too? It's evaluating essentially the
same conditions on the index tuple ...

So I'm starting to think this just my misunderstanding of why IOS does
this VM check - it's purely to determine visibility of the result. When
it sees a pointer to page that is not all-visible, it decides it'll need
to do check visibility on a heap tuple anyway, and just fetches the heap
tuple right away. Which however ignores that the filters may eliminate
many of those tuples, so IOS could also make such unnecessary heap
accesses. It might be better to check the filters first, and only then
maybe fetch the heap tuple ...

> While I think that it makes sense to assume that "index filters" are
> strictly better than "table filters" (assuming they're directly
> equivalent in that they contain the same clause), they're not
> *reliably* any better. So "index filters" are far from being anywhere
> near as good as an equivalent index qual (AFAICT we should always
> assume that that's true). This is true of index quals generally --
> this advantage is *not* limited to "access predicate index quals". (It
> is most definitely not okay for "index filters" to displace equivalent
> "access predicate index quals", but it's also not really okay to allow
> them to displace equivalent "index filter predicate index quals" --
> the latter case is less bad, but AFAICT they both basically aren't
> acceptable "substitutions".)
> 

I'm not quite sure what are the differences between "index qual" vs.
"access predicate index qual" vs. "index filter predicate index quals",
or what "dispacing" would mean exactly. But I agree there's a hierarchy
of qual types, and some "promotions" are likely guaranteed to be good.


FWIW this also reminds me that this whole discussion mostly focused on
SAOP clauses (and how they may be translated into access predicates
etc.). My patch is however way more general - it applies to all clauses,
not just SAOP ones, including clauses with no chance of evaluating at
the AM level.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Fri, Aug 4, 2023 at 4:34 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Thanks for the clarification, I think I understand better now.

Honestly, it's gratifying to be understood at all in a discussion like
this one. Figuring out how to articulate some of my more subtle ideas
(without leaving out important nuances) can be a real struggle for me.

> Let me briefly sum my understanding for the two patches:
>
> - The SAOP patch eliminates those heap accesses because it manages to
> evaluate all clauses in the AM, including clauses that would previously
> be treated as "table filters" and evaluated on the heap row.

Yes, exactly.

> - My patch achieves a similar result by evaluating the clauses as index
> filters (i.e. on the index tuple). That's probably not as good as proper
> access predicates, so it can't help with the index page accesses, but
> better than what we had before.

Yes, exactly.

> No, the extra accesses were not because of VM buffer hits - it was
> because of having to actually fetch the heap tuple for pages that are
> not fully visible, which is what happens right after the insert.

Yes, exactly.

> The patch does what we index-only scans do - before evaluating the
> filters on an index tuple, we check if the page is fully visible. If
> not, we fetch the heap tuple and evaluate the filters on it.
>
> This means even an index-only scan would behave like this too. And it
> goes away as the table gets vacuumed, at which point we can eliminate
> the rows using only the index tuple again.

Yes, exactly.

> Right. This however begs a question - why would we actually need to
> check the visibility map before evaluating the index filter, when the
> index tuple alone is clearly good enough for the bitmapOr plan?
>
> Because if we didn't need to do that VM check, this issue with extra
> heap accesses would disappear.

The index AM is entitled to make certain assumptions of opclass
members -- assumptions that cannot be made during expression
evaluation. The classic example is division-by-zero during evaluation
of a qual, for a tuple that wasn't visible anyway. Our assumption is
that stuff like that just cannot happen with index quals -- users
shouldn't ever encounter sanity check errors caused by
invisible-to-their-MVCC-snapshot tuples.

I think that that's the main difficulty, as far as avoiding heap
access for index filters is concerned. Of course, you could just limit
yourself to those cases where the index AM assumptions were safe. But
at that point, why not simply make sure to generate true index quals,
and be done with it?

> I copied this from the IOS somewhat blindly, but now I'm starting to
> think it was misguided. I thought it's a protection against processing
> "invalid" tuples - not tuples broken after a crash (as that would be
> fixed by crash recovery), but e.g. tuples with schema changes from an
> aborted transaction.

I agree that schema changes for indexes shouldn't be an issue, though.

> I'm not quite sure what are the differences between "index qual" vs.
> "access predicate index qual" vs. "index filter predicate index quals",
> or what "dispacing" would mean exactly.

For the purposes of this point about "a hierarchy of quals", it
doesn't really matter -- that was the point I was trying to make.

In other words, "index quals" are strictly better than equivalent
"index filters", which are themselves strictly better than equivalent
"table filters". While it is also true that you can meaningfully
classify "index quals" into their own hierarchy (namely access
predicates vs index filter predicates), that doesn't necessarily need
to be discussed when discussing the hierarchy from a planner point of
view, since it is (at least for now) internal to the nbtree index AM.

On second thought, I tend to doubt that your patch needs to worry
about each type of index qual directly. It probably needs to worry
about index quals in general.

It is always better to make what could be an "index filter" into an
index qual. Of course, the current practical problem for you is
figuring out how to deal with that in cases like the BitmapOr case.
Since it's not as if the planner is wrong, exactly...it really is the
cheapest plan, so the planner is at least right on its own terms. I am
interested in finding a solution to that problem.

> FWIW this also reminds me that this whole discussion mostly focused on
> SAOP clauses (and how they may be translated into access predicates
> etc.). My patch is however way more general - it applies to all clauses,
> not just SAOP ones, including clauses with no chance of evaluating at
> the AM level.

I agree that your patch is way more general, in one way. But the SAOP
patch is also much more general than you might think.

For one thing the whole BitmapOr plan issue actually required that you
compared your patch to a combination of my SAOP patch and Alena
Rybakina's OR-to-SAOP transformation patch -- you needed both patches.
Her patch effectively made my own patch much more general. But there
are all kinds of other transformations that might further extend the
applicability of nbtree executor changes from my patch -- the MDAM
techniques are certainly not limited to ORs/SAOPs. For example, it's
easy to imagine inventing a new type of SAOP-style clause that
represented "BETWEEN x AND y" expressions, that would make range
predicates into "first class indexable operators" --
ScalarRangeOprExpr, or something. With multi-column indexes, these
ScalarRangeOprExpr clauses could be composed beside ScalarArrayOpExpr
clauses, as well as simpler clauses -- all while preserving index
order on output. So there are quite a few plan shapes that aren't
possible at all right now, that become possible, many of which don't
even have SAOPs/ORs.

Of course it won't ever be possible to create a transformation that
doesn't ultimately flatten everything into MDAM style "single value"
DNF predicates, which have to use simple B-Tree opclass operators --
obviously there are fundamental limits to it. So even in a perfect
world, with every possible MDAM-ish transformation implemented, we'll
still have a significant need for your patch.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/5/23 02:53, Peter Geoghegan wrote:
> ...
> 
>> Right. This however begs a question - why would we actually need to
>> check the visibility map before evaluating the index filter, when the
>> index tuple alone is clearly good enough for the bitmapOr plan?
>>
>> Because if we didn't need to do that VM check, this issue with extra
>> heap accesses would disappear.
> 
> The index AM is entitled to make certain assumptions of opclass
> members -- assumptions that cannot be made during expression
> evaluation. The classic example is division-by-zero during evaluation
> of a qual, for a tuple that wasn't visible anyway. Our assumption is
> that stuff like that just cannot happen with index quals -- users
> shouldn't ever encounter sanity check errors caused by
> invisible-to-their-MVCC-snapshot tuples.
> 

Thanks for reminding me, I keep forgetting about this.

> I think that that's the main difficulty, as far as avoiding heap
> access for index filters is concerned. Of course, you could just limit
> yourself to those cases where the index AM assumptions were safe. But
> at that point, why not simply make sure to generate true index quals,
> and be done with it?
> 

Yeah, if it's possible to generate true index quals, it'd be stupid not
to do that. But clearly there are cases where that's not possible (we
may not have the code doing that, or maybe it's just not possible in
principle).

Would it be possible to inspect the expression and determine if it's
"safe" to be treated almost like an index qual? Say, given a complex
expression, we might check if it consists only of expressions that could
be treated as an index qual. So a bit like leak-proof, which may
actually be relevant here too I guess (e.g. int4div is not leak-proof,
for example, so e.g. the division-by-zero would not allow index-qual
treatment).

I now recall there probably was a past discussion about how leak-proof
relates to this, but IIRC the conclusion was it's not quite the same
thing. But maybe I just remember things wrong.

Anyway, I think there are always going to be clauses that would be safe
to evaluate on the index, but the index AM does not know to handle them
for some reason. For example it might require extending the AM to handle
generic expressions, which doesn't seem quite desirable.

So I think I see three points where we could evaluate expressions:

1) in the AM, as access preditates / index quals (doing this more often
   is kinda what the SAOP patches aim to do)

2) in the index scan, before checking VM / visibility (if the qual is
   safe to be evaluated early)

3) in the index scan, after checking VM / visibility (if the expression
   does unsafe things)



>> I copied this from the IOS somewhat blindly, but now I'm starting to
>> think it was misguided. I thought it's a protection against processing
>> "invalid" tuples - not tuples broken after a crash (as that would be
>> fixed by crash recovery), but e.g. tuples with schema changes from an
>> aborted transaction.
> 
> I agree that schema changes for indexes shouldn't be an issue, though.
> 
>> I'm not quite sure what are the differences between "index qual" vs.
>> "access predicate index qual" vs. "index filter predicate index quals",
>> or what "dispacing" would mean exactly.
> 
> For the purposes of this point about "a hierarchy of quals", it
> doesn't really matter -- that was the point I was trying to make.
> 
> In other words, "index quals" are strictly better than equivalent
> "index filters", which are themselves strictly better than equivalent
> "table filters". While it is also true that you can meaningfully
> classify "index quals" into their own hierarchy (namely access
> predicates vs index filter predicates), that doesn't necessarily need
> to be discussed when discussing the hierarchy from a planner point of
> view, since it is (at least for now) internal to the nbtree index AM.
> 
> On second thought, I tend to doubt that your patch needs to worry
> about each type of index qual directly. It probably needs to worry
> about index quals in general.
> 

I agree. That seems like a discussion relevant to the general topic of
"upgrading" clauses. If anything, the patch may need to worry about
moving table filters to index filters, that's the only thing it does.

> It is always better to make what could be an "index filter" into an
> index qual. Of course, the current practical problem for you is
> figuring out how to deal with that in cases like the BitmapOr case.
> Since it's not as if the planner is wrong, exactly...it really is the
> cheapest plan, so the planner is at least right on its own terms. I am
> interested in finding a solution to that problem.
> 

Well, if the planner is not wrong, what solution are we looking for? ;-)

FWIW if the problem is the patch may make the new plan look cheaper than
some "actually better" plan (e.g. the BitmapOr one). In that case, we
could just keep the old costing (kinda assuming the worst case, but as
you said, the benefits are limited, while the risks are arbitrary).
That's the only idea I have.

>> FWIW this also reminds me that this whole discussion mostly focused on
>> SAOP clauses (and how they may be translated into access predicates
>> etc.). My patch is however way more general - it applies to all clauses,
>> not just SAOP ones, including clauses with no chance of evaluating at
>> the AM level.
> 
> I agree that your patch is way more general, in one way. But the SAOP
> patch is also much more general than you might think.
> 
> For one thing the whole BitmapOr plan issue actually required that you
> compared your patch to a combination of my SAOP patch and Alena
> Rybakina's OR-to-SAOP transformation patch -- you needed both patches.
> Her patch effectively made my own patch much more general. But there
> are all kinds of other transformations that might further extend the
> applicability of nbtree executor changes from my patch -- the MDAM
> techniques are certainly not limited to ORs/SAOPs. For example, it's
> easy to imagine inventing a new type of SAOP-style clause that
> represented "BETWEEN x AND y" expressions, that would make range
> predicates into "first class indexable operators" --
> ScalarRangeOprExpr, or something. With multi-column indexes, these
> ScalarRangeOprExpr clauses could be composed beside ScalarArrayOpExpr
> clauses, as well as simpler clauses -- all while preserving index
> order on output. So there are quite a few plan shapes that aren't
> possible at all right now, that become possible, many of which don't
> even have SAOPs/ORs.
> 
> Of course it won't ever be possible to create a transformation that
> doesn't ultimately flatten everything into MDAM style "single value"
> DNF predicates, which have to use simple B-Tree opclass operators --
> obviously there are fundamental limits to it. So even in a perfect
> world, with every possible MDAM-ish transformation implemented, we'll
> still have a significant need for your patch.
> 

Yup, that's exactly my point.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Sun, Aug 6, 2023 at 10:23 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> > The index AM is entitled to make certain assumptions of opclass
> > members -- assumptions that cannot be made during expression
> > evaluation.

> Thanks for reminding me, I keep forgetting about this.

I was almost certain that you already knew that, actually. It's easy
to forget such details in a discussion like this one, where the focus
zooms out and then zooms back in, again and again.

> Would it be possible to inspect the expression and determine if it's
> "safe" to be treated almost like an index qual? Say, given a complex
> expression, we might check if it consists only of expressions that could
> be treated as an index qual. So a bit like leak-proof, which may
> actually be relevant here too I guess (e.g. int4div is not leak-proof,
> for example, so e.g. the division-by-zero would not allow index-qual
> treatment).

Clearly you're talking about a distinct set of guarantees to the ones
that B-Tree opclasses make about not throwing errors when scanning
maybe-not-visible index tuples. The B-Tree opclass guarantees might
not even be written down anywhere -- they seem like common sense,
almost.

What you've described definitely seems like it could be very useful,
but I don't think that it solves the fundamental problem with cases
like the BitmapOr plan. Even if you do things in a way that precludes
the possibility of extra heap page accesses (when the VM bit isn't
set), you still have the problem of "access predicates vs index filter
predicates". Which is a big problem, in and of itself.

> Anyway, I think there are always going to be clauses that would be safe
> to evaluate on the index, but the index AM does not know to handle them
> for some reason. For example it might require extending the AM to handle
> generic expressions, which doesn't seem quite desirable.

Actually, I mostly don't think we'd need to teach nbtree or other
index AMs anything about simplifying expressions. Structurally, we
should try to make things like ScalarArrayOpExpr into "just another
indexable operator", which has little to no difference with any other
indexable operator at runtime.

There probably are several good reasons why "normalizing to CNF" in
the planner is a good idea (to some extent we do this already). Alena
Rybakina's OR-to-SAOP transformation patch was written well before
anybody knew about the MDAM/SAOP patch I'm working on. The original
motivation was to lower expression evaluation overhead.

You could probably find a third of even a fourth reason to do that
specific transformation, if you thought about it for a while. Top-down
planner designs such as Cascades really have to spend a lot of time on
this kind of normalization process. For very general reasons -- many
of which are bound to apply in our own bottom-up planner design.

> So I think I see three points where we could evaluate expressions:
>
> 1) in the AM, as access preditates / index quals (doing this more often
>    is kinda what the SAOP patches aim to do)
>
> 2) in the index scan, before checking VM / visibility (if the qual is
>    safe to be evaluated early)
>
> 3) in the index scan, after checking VM / visibility (if the expression
>    does unsafe things)

Agreed.

Presumably there would also be a class of expressions that the patch
should make into index filters rather than table filters, despite
being unable to determine whether they're safe to evaluate early. Even
if there is only a small chance of it helping at runtime, there is no
reason (or infinitesimally small reason) to not to just do prefer
index filters where possible -- so it's desirable to always prefer
index filters, regardless of opclass/type restrictions on early
evaluation. Right?

Assuming the answer is yes, then I think that you still need all of
the index-only scan stuff that can "fallback to heap access", just to
be able to cover this other class of expression. I don't think that
this class that I've described will be rarely encountered, or
anything.

> I agree. That seems like a discussion relevant to the general topic of
> "upgrading" clauses. If anything, the patch may need to worry about
> moving table filters to index filters, that's the only thing it does.

Obviously that will have indirect consequences due to the changes in
the costing.

> > It is always better to make what could be an "index filter" into an
> > index qual. Of course, the current practical problem for you is
> > figuring out how to deal with that in cases like the BitmapOr case.
> > Since it's not as if the planner is wrong, exactly...it really is the
> > cheapest plan, so the planner is at least right on its own terms. I am
> > interested in finding a solution to that problem.
> >
>
> Well, if the planner is not wrong, what solution are we looking for? ;-)

I imagine that you really don't want to have to rely on some
wishy-washy philosophical argument about the planner's expectation
being the only reasonable basis for choosing a plan. Just as I don't
want to have to rely on some similarly hand-wavy argument about risk.
The ideal outcome is one that doesn't require any of that, from either
of us.

I believe that the patch that I'm working on can allow us to totally
avoid it. I hesitate to say this, since it might sound like I'm
imposing conditions in a self-interested way. AFAICT it really does
provide us with a practical way of just not having to go down the road
that nobody wants to go down. So I am just about ready to say that I
believe that that will end up being the solution we use. It just seems
to make sense.

By normalizing to CNF, the planner is given the ability to work with
higher-level index paths, that abstract-away inessential "physical
plan" differences. Suppose, for example, we're building index paths
for a scan that comes from an SAOP that was generated through OR
transformation. But, it's an index AM that lacks native support for
SAOPs -- not nbtree. That index path will still end up using a
BitmapOr, in the end, since it'll ultimately have to compensate for
the lack of index AM infrastructure. So the final "physical plan" will
be exactly the same as today -- the OR transformation will actually
have changed nothing about the physical plan in these sorts of cases.

The CNF transformation process just puts what was already true ("these
two plans are logically equivalent") on a more formal footing -- and
so implicitly avoids "risky plans" like the one we've discussed. We'll
only be relying on the nbtree work to get those small efficiencies
that come from avoiding duplicative primitive index scans. Since that
was never actually a goal of your patch to begin with, limiting that
benefit to nbtree scans (where we can do it without novel risks) seems
more than acceptable.

Since you're not relying on the nbtree work at all here, really (just
on the transformation process itself), the strategic risk that this
adds to your project isn't too great. It's not like this ties the
success of your patch to the success of my own patch. At most it ties
the success of your patch to something like Alena Rybakina's
OR-to-SAOP transformation patch, which seems manageable to me. (To be
clear, I'm not relying on that work in the same way myself -- for my
patch the transformation process is just a nice-to-have.)

> FWIW if the problem is the patch may make the new plan look cheaper than
> some "actually better" plan (e.g. the BitmapOr one). In that case, we
> could just keep the old costing (kinda assuming the worst case, but as
> you said, the benefits are limited, while the risks are arbitrary).
> That's the only idea I have.

That's almost ideal for my patch. I should be able to pursue my patch
without concern about your patch interfering too much -- presumably my
patch will be able to generate the cheapest plan in cases like the
BitmapOr case. It'll at least be slightly cheaper in physical reality,
and now you'll artificially penalize the paths that your patch
generates -- so cases like the BitmapOr case should do the right thing
by seeing the SAOP path as cheaper during planning.

But is that approach really ideal for your patch? I doubt it. Why
wouldn't we want to give the index paths with index filters credit for
being cheaper in almost all cases? That's why this doesn't feel like a
costing issue to me (it's more of a "just don't do that" issue, I
think). Your patch seems too important to nerf like this, even if it's
convenient in some ways.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Sun, Aug 6, 2023 at 1:13 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Since you're not relying on the nbtree work at all here, really (just
> on the transformation process itself), the strategic risk that this
> adds to your project isn't too great. It's not like this ties the
> success of your patch to the success of my own patch. At most it ties
> the success of your patch to something like Alena Rybakina's
> OR-to-SAOP transformation patch, which seems manageable to me. (To be
> clear, I'm not relying on that work in the same way myself -- for my
> patch the transformation process is just a nice-to-have.)

I decided to verify my understanding by checking what would happen
when I ran the OR-heavy tenk1 regression test query against a
combination of your patch, and v7 of the OR-to-SAOP transformation
patch. (To be clear, this is without my patch.)

I found that the problem that I saw with the OR-heavy tenk1 regression
test goes away (though only when I "set or_transform_limit=0"). That
is, we'll get an index scan plan that uses a SAOP. This index scan
plan is comparable to the master branch's BitmapOr scan. In
particular, both plans get 7 buffer hits. More importantly, the new
plan is (like the master branch plan) not risky in the way I've been
going on about.

This does mean that your patch gets a *slightly* slower plan, due to
the issue of added index page accesses. Improving that should be a job
for my patch -- it's not your problem, since there is no regression.

I'm not sure if it's somehow still possible that SAOP expression
evaluation is able to "do the risky thing" in the same way that your
patch's "Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3)
OR (tenk1.tenthous = 42))" plan. But it certainly looks like it can't.
Increasingly, the problem here appears to me to be a problem of
lacking useful CNF transformations/normalization -- nothing more.
Structuring things so that we reliably use "the native representation
of ORs" via normalization seems likely to be all you really need.

We may currently be over relying on a similar process that happens
indirectly, via BitmapOr paths. I suspect that you're right to be
concerned about how this might already be affecting index-only scans.
Once we have CNF normalization/transformation in place, we will of
course continue to use some BitmapOr plans that may look a little like
the ones I'm focussed on, and some plans that use "index filters" (due
to your patch) that are also a little like that. But there's nothing
objectionable about those cases IMV (quite the opposite), since there
is no question of displacing/out-competing very similar plans that can
use index quals. (You might also find a way to avoid ever requiring
heap access/visibility checks for a subset of the "index filter" cases
where it is determined to be safe up front, but that's just a bonus.)

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Sun, Aug 6, 2023 at 3:28 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I decided to verify my understanding by checking what would happen
> when I ran the OR-heavy tenk1 regression test query against a
> combination of your patch, and v7 of the OR-to-SAOP transformation
> patch. (To be clear, this is without my patch.)

I also spotted what looks like it might be a problem with your patch
when looking at this query (hard to be sure if it's truly a bug,
though).

I manually SAOP-ify the OR-heavy tenk1 regression test query like so:

select
  *
from
  tenk1
where
  thousand = 42
  and tenthous in (1, 3, 42);

Sure enough, I continue to get 7 buffer hits with this query. Just
like with the BitmapOr plan (and exactly like the original query with
the OR-to-SAOP transformation patch in place).

As I continue to add SAOP constants to the original "tenthous" IN(),
eventually the planner switches over to not using index quals on the
"tenthous" low order index column (they're only used on the high order
"thousand" index column). Here's where the switch to only using the
leading column from the index happens for me:

select
  *
from
  tenk1
where
  thousand = 42
  and
  tenthous in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50);

This plan switchover isn't surprising in itself -- it's one of the most
important issues addressed by my SAOP patch. However, it *is* a little
surprising that your patch doesn't even manage to use "Index Filter"
quals. It appears that it is only capable of using table filter quals.
Obviously, the index has all the information that expression
evaluation needs, and yet I see "Filter: (tenk1.tenthous = ANY
('{1,3,42,43,44,45,46,47,48,49,50}'::integer[]))". So no improvement
over master here.

Interestingly enough, your patch only has this problem with SAOPs, at
least that I know of -- the spelling/style matters. If I add many
additional "tenthous" constants to the original version of the query
from the regression tests in the same way, but using the "longform"
(tenthous = 1 or tenthous = 3 ...) spelling, then your patch does
indeed use index filters/expression evaluation. Just like the original
"risky" plan (it's just a much bigger expression, with many more ORs).

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/7/23 02:38, Peter Geoghegan wrote:
> On Sun, Aug 6, 2023 at 3:28 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> I decided to verify my understanding by checking what would happen
>> when I ran the OR-heavy tenk1 regression test query against a
>> combination of your patch, and v7 of the OR-to-SAOP transformation
>> patch. (To be clear, this is without my patch.)
> 
> I also spotted what looks like it might be a problem with your patch
> when looking at this query (hard to be sure if it's truly a bug,
> though).
> 
> I manually SAOP-ify the OR-heavy tenk1 regression test query like so:
> 
> select
>   *
> from
>   tenk1
> where
>   thousand = 42
>   and tenthous in (1, 3, 42);
> 
> Sure enough, I continue to get 7 buffer hits with this query. Just
> like with the BitmapOr plan (and exactly like the original query with
> the OR-to-SAOP transformation patch in place).
> 
> As I continue to add SAOP constants to the original "tenthous" IN(),
> eventually the planner switches over to not using index quals on the
> "tenthous" low order index column (they're only used on the high order
> "thousand" index column). Here's where the switch to only using the
> leading column from the index happens for me:
> 
> select
>   *
> from
>   tenk1
> where
>   thousand = 42
>   and
>   tenthous in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50);
> 
> This plan switchover isn't surprising in itself -- it's one of the most
> important issues addressed by my SAOP patch. However, it *is* a little
> surprising that your patch doesn't even manage to use "Index Filter"
> quals. It appears that it is only capable of using table filter quals.
> Obviously, the index has all the information that expression
> evaluation needs, and yet I see "Filter: (tenk1.tenthous = ANY
> ('{1,3,42,43,44,45,46,47,48,49,50}'::integer[]))". So no improvement
> over master here.
> 
> Interestingly enough, your patch only has this problem with SAOPs, at
> least that I know of -- the spelling/style matters. If I add many
> additional "tenthous" constants to the original version of the query
> from the regression tests in the same way, but using the "longform"
> (tenthous = 1 or tenthous = 3 ...) spelling, then your patch does
> indeed use index filters/expression evaluation. Just like the original
> "risky" plan (it's just a much bigger expression, with many more ORs).
> 

Right. This happens because the matching of SAOP to indexes happens in
multiple places. Firstly, create_index_paths() matches the clauses to
the index by calling

    match_restriction_clauses_to_index
     -> match_clauses_to_index
         -> match_clause_to_index

Which is where we also decide which *unmatched* clauses can be filters.
And this *does* match the SAOP to the index key, hence no index filter.

But then we call get_index_paths/build_index_path a little bit later,
and that decides to skip "lower SAOP" (which seems a bit strange,
because the column is "after" the equality, but meh). Anyway, at this
point we already decided what's a filter, ignoring the index clauses,
and not expecting any backsies.

The simples fix seems to be to add these skipped SAOP clauses as
filters. We know it can be evaluated on the index ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Mon, Aug 7, 2023 at 12:34 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> But then we call get_index_paths/build_index_path a little bit later,
> and that decides to skip "lower SAOP" (which seems a bit strange,
> because the column is "after" the equality, but meh). Anyway, at this
> point we already decided what's a filter, ignoring the index clauses,
> and not expecting any backsies.

I'm not surprised that it's due to the issue around "lower SAOP"
clauses within get_index_paths/build_index_path. That whole approach
seems rather ad-hoc to me. As you probably realize already, my own
patch has to deal with lots of issues in the same area.

> The simples fix seems to be to add these skipped SAOP clauses as
> filters. We know it can be evaluated on the index ...

Right. Obviously, my preferred solution to the problem at hand is to
make everything into index quals via an approach like the one from my
patch -- that works sensibly, no matter the length of the SAOP arrays.
But even if you're willing to assume that that work will be in place
for 17, there are still certain remaining gaps, that also seem
important.

Even my patch cannot always make SAOP clauses into index quals. There
are specific remaining gaps that I hope that your patch will still
cover. The simplest example is a similar NOT IN() inequality, like
this:

select
  ctid, *
from
  tenk1
where
  thousand = 42
  and
  tenthous not in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50);

There is no way that my patch can handle this case. Where your patch
seems to be unable to do better than master here, either -- just like
with the "tenthous in ( )" variant. Once again, the inequality SAOP
also ends up as table filter quals, not index filter quals.

It would also be nice if we found a way of doing this, while still
reliably avoiding all visibility checks (just like "real index quals"
will) -- since that should be safe in this specific case.

The MDAM paper describes a scheme for converting NOT IN() clauses into
DNF single value predicates. But that's not going to happen for 17,
and doesn't seem all that helpful with a query like this in any case.
But it does suggest an argument in favor of visibility checks not
being truly required for SAOP inequalities like this one (when they
appear in index filters). I'm not sure if that idea is too particular
to SAOP inequalities to be interesting -- just a suggestion.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Mon, Aug 7, 2023 at 3:18 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Even my patch cannot always make SAOP clauses into index quals. There
> are specific remaining gaps that I hope that your patch will still
> cover. The simplest example is a similar NOT IN() inequality, like
> this:
>
> select
>   ctid, *
> from
>   tenk1
> where
>   thousand = 42
>   and
>   tenthous not in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50);
>
> There is no way that my patch can handle this case. Where your patch
> seems to be unable to do better than master here, either -- just like
> with the "tenthous in ( )" variant. Once again, the inequality SAOP
> also ends up as table filter quals, not index filter quals.
>
> It would also be nice if we found a way of doing this, while still
> reliably avoiding all visibility checks (just like "real index quals"
> will) -- since that should be safe in this specific case.

Actually, this isn't limited to SAOP inequalities. It appears as if
*any* simple inequality has the same limitation. So, for example, the
following query can only use table filters with the patch (never index
filters):

select
  ctid, *
from
  tenk1
where
  thousand = 42 and tenthous != 1;

This variant will use index filters, as expected (though with some
risk of heap accesses when VM bits aren't set):

select
  ctid, *
from
  tenk1
where
  thousand = 42 and tenthous is distinct from 1;

Offhand I suspect that it's a similar issue to the one you described for SAOPs.

I see that get_op_btree_interpretation() will treat != as a kind of
honorary member of an opfamily whose = operator has our != operator as
its negator. Perhaps we should be finding a way to pass != quals into
the index AM so that they become true index quals (obviously they
would only be index filter predicates, never access predicates). That
has the advantage of working in a way that's analogous to the way that
index quals already avoid visibility checks.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/8/23 00:18, Peter Geoghegan wrote:
> On Mon, Aug 7, 2023 at 12:34 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> But then we call get_index_paths/build_index_path a little bit later,
>> and that decides to skip "lower SAOP" (which seems a bit strange,
>> because the column is "after" the equality, but meh). Anyway, at this
>> point we already decided what's a filter, ignoring the index clauses,
>> and not expecting any backsies.
> 
> I'm not surprised that it's due to the issue around "lower SAOP"
> clauses within get_index_paths/build_index_path. That whole approach
> seems rather ad-hoc to me. As you probably realize already, my own
> patch has to deal with lots of issues in the same area.
> 

Yeah. It's be much easier if the decision was done in one place, without
then changing it later.

>> The simples fix seems to be to add these skipped SAOP clauses as
>> filters. We know it can be evaluated on the index ...
> 
> Right. Obviously, my preferred solution to the problem at hand is to
> make everything into index quals via an approach like the one from my
> patch -- that works sensibly, no matter the length of the SAOP arrays.
> But even if you're willing to assume that that work will be in place
> for 17, there are still certain remaining gaps, that also seem
> important.
> 

Agreed.

> Even my patch cannot always make SAOP clauses into index quals. There
> are specific remaining gaps that I hope that your patch will still
> cover. The simplest example is a similar NOT IN() inequality, like
> this:
> 
> select
>   ctid, *
> from
>   tenk1
> where
>   thousand = 42
>   and
>   tenthous not in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50);
> 
> There is no way that my patch can handle this case. Where your patch
> seems to be unable to do better than master here, either -- just like
> with the "tenthous in ( )" variant. Once again, the inequality SAOP
> also ends up as table filter quals, not index filter quals.
> 

Are you sure? Because if I try with the 20230716 patch, I get this plan
(after disabling bitmapscan):

                             QUERY PLAN
-------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on tenk1  (cost=0.31..44.54
rows=10 width=250)
   Index Cond: (thousand = 42)
   Index Filter: (tenthous <> ALL
('{1,3,42,43,44,45,46,47,48,49,50}'::integer[]))
   Filter: (tenthous <> ALL ('{1,3,42,43,44,45,46,47,48,49,50}'::integer[]))
(4 rows)

So the condition is recognized as index filter. Or did you mean
something different?

> It would also be nice if we found a way of doing this, while still
> reliably avoiding all visibility checks (just like "real index quals"
> will) -- since that should be safe in this specific case.
> 
> The MDAM paper describes a scheme for converting NOT IN() clauses into
> DNF single value predicates. But that's not going to happen for 17,
> and doesn't seem all that helpful with a query like this in any case.
> But it does suggest an argument in favor of visibility checks not
> being truly required for SAOP inequalities like this one (when they
> appear in index filters). I'm not sure if that idea is too particular
> to SAOP inequalities to be interesting -- just a suggestion.
> 

Not sure. A couple messages back I suggested that maybe there is a way
to check which expression would be safe to evaluate before checking the
visibility. This seems similar, although what you're suggesting really
applies to the "transformed" SAOP, and I'm not sure it can be extended
to the original SAOP.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/8/23 06:21, Peter Geoghegan wrote:
> On Mon, Aug 7, 2023 at 3:18 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> Even my patch cannot always make SAOP clauses into index quals. There
>> are specific remaining gaps that I hope that your patch will still
>> cover. The simplest example is a similar NOT IN() inequality, like
>> this:
>>
>> select
>>   ctid, *
>> from
>>   tenk1
>> where
>>   thousand = 42
>>   and
>>   tenthous not in (1, 3, 42, 43, 44, 45, 46, 47, 48, 49, 50);
>>
>> There is no way that my patch can handle this case. Where your patch
>> seems to be unable to do better than master here, either -- just like
>> with the "tenthous in ( )" variant. Once again, the inequality SAOP
>> also ends up as table filter quals, not index filter quals.
>>
>> It would also be nice if we found a way of doing this, while still
>> reliably avoiding all visibility checks (just like "real index quals"
>> will) -- since that should be safe in this specific case.
> 
> Actually, this isn't limited to SAOP inequalities. It appears as if
> *any* simple inequality has the same limitation. So, for example, the
> following query can only use table filters with the patch (never index
> filters):
> 
> select
>   ctid, *
> from
>   tenk1
> where
>   thousand = 42 and tenthous != 1;
> 
> This variant will use index filters, as expected (though with some
> risk of heap accesses when VM bits aren't set):
> 
> select
>   ctid, *
> from
>   tenk1
> where
>   thousand = 42 and tenthous is distinct from 1;
> 
> Offhand I suspect that it's a similar issue to the one you described for SAOPs.
> 
> I see that get_op_btree_interpretation() will treat != as a kind of
> honorary member of an opfamily whose = operator has our != operator as
> its negator. Perhaps we should be finding a way to pass != quals into
> the index AM so that they become true index quals (obviously they
> would only be index filter predicates, never access predicates). That
> has the advantage of working in a way that's analogous to the way that
> index quals already avoid visibility checks.
> 

Are you sure you're using the right build? Because I get this plan:

                             QUERY PLAN
-------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on tenk1  (cost=0.29..44.48
rows=10 width=250)
   Index Cond: (thousand = 42)
   Index Filter: (tenthous <> 1)
   Filter: (tenthous <> 1)
(4 rows)

Again, the inequality is clearly recognized as index filter.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Tue, Aug 8, 2023 at 7:28 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Are you sure? Because if I try with the 20230716 patch, I get this plan
> (after disabling bitmapscan):

> So the condition is recognized as index filter. Or did you mean
> something different?

No, I was just wrong about this (and about inequalities in general). I
now see why the planner preferred a bitmap scan, which makes sense.
Apologies.

> Not sure. A couple messages back I suggested that maybe there is a way
> to check which expression would be safe to evaluate before checking the
> visibility. This seems similar, although what you're suggesting really
> applies to the "transformed" SAOP, and I'm not sure it can be extended
> to the original SAOP.

The transformation doesn't necessarily have to happen in order for it
to be possible in principle (and correct). My point was that there are
a handful of important types of expressions (SAOPs among them, but
possibly also RowCompareExpr and IS NULL tests) that are "index quals
in spirit". These expressions therefore don't seem to need visibility
checks at all -- the index qual guarantees "apply transitively".

It's possible that an approach that focuses on leakproof quals won't
have any problems, and will be strictly better than "extending the
index qual guarantees to index-qual-like expressions". Really not sure
about that.

In any case I see a huge amount of value in differentiating between
cases that need visibility checks (if only via the VM) and those that
do not, ever. I'm speaking very generally here -- nothing to do with
my adversarial tenk1 test case. It's weird that index quals have such
a massive advantage over even simple index filters -- that feels
artificial. I suggest that you focus on that aspect, since it has the
potential to make what is already a compelling patch into a much more
compelling patch.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Mon, Aug 7, 2023 at 9:21 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I see that get_op_btree_interpretation() will treat != as a kind of
> honorary member of an opfamily whose = operator has our != operator as
> its negator. Perhaps we should be finding a way to pass != quals into
> the index AM so that they become true index quals (obviously they
> would only be index filter predicates, never access predicates). That
> has the advantage of working in a way that's analogous to the way that
> index quals already avoid visibility checks.

The approach in your patch can only really work with index scans (and
index-only scans). So while it is more general than true index quals
in some ways, it's also less general in other ways: it cannot help
bitmap index scans.

While I accept that the inability of bitmap index scans to use index
filters in this way is, to some degree, a natural and inevitable
downside of bitmap index scans, that isn't always true. For example,
it doesn't seem to be the case with simple inequalities. Bitmap index
scans argue for making cases involving quals that are "index quals in
spirit" into actual index quals. Even if you can reliably avoid extra
heap accesses for plain index scans using expression evaluation, I
can't see that working for bitmap index scans.

More concretely, if we have an index on "tenk1 (four, two)", then we
miss out on the opportunity to eliminate heap accesses for a query
like this one:

select
  ctid, *
from
  tenk1
where
  four = 1 and two != 1;

This will get a bitmap index scan plan (that uses our composite
index), which makes sense overall. But the details beyond that make no
sense -- since we're using table filter quals for "two". It turns out
that the bitmap heap scan will access every single heap page in the
tenk1 table as a result, even though we could have literally avoided
all heap accesses had we been able to push down the != as an index
qual. This is a difference in "buffers hit" that is close to 2 orders
of magnitude.

I'd be okay with treating these cases as out of scope for this patch,
but we should probably agree on the parameters. The patch certainly
shouldn't make it any harder to fix cases such as this.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/8/23 19:43, Peter Geoghegan wrote:
> On Mon, Aug 7, 2023 at 9:21 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> I see that get_op_btree_interpretation() will treat != as a kind of
>> honorary member of an opfamily whose = operator has our != operator as
>> its negator. Perhaps we should be finding a way to pass != quals into
>> the index AM so that they become true index quals (obviously they
>> would only be index filter predicates, never access predicates). That
>> has the advantage of working in a way that's analogous to the way that
>> index quals already avoid visibility checks.
> 
> The approach in your patch can only really work with index scans (and
> index-only scans). So while it is more general than true index quals
> in some ways, it's also less general in other ways: it cannot help
> bitmap index scans.
> 
> While I accept that the inability of bitmap index scans to use index
> filters in this way is, to some degree, a natural and inevitable
> downside of bitmap index scans, that isn't always true. For example,
> it doesn't seem to be the case with simple inequalities. Bitmap index
> scans argue for making cases involving quals that are "index quals in
> spirit" into actual index quals. Even if you can reliably avoid extra
> heap accesses for plain index scans using expression evaluation, I
> can't see that working for bitmap index scans.
> 
> More concretely, if we have an index on "tenk1 (four, two)", then we
> miss out on the opportunity to eliminate heap accesses for a query
> like this one:
> 
> select
>   ctid, *
> from
>   tenk1
> where
>   four = 1 and two != 1;
> 
> This will get a bitmap index scan plan (that uses our composite
> index), which makes sense overall. But the details beyond that make no
> sense -- since we're using table filter quals for "two". It turns out
> that the bitmap heap scan will access every single heap page in the
> tenk1 table as a result, even though we could have literally avoided
> all heap accesses had we been able to push down the != as an index
> qual. This is a difference in "buffers hit" that is close to 2 orders
> of magnitude.
> 
> I'd be okay with treating these cases as out of scope for this patch,
> but we should probably agree on the parameters. The patch certainly
> shouldn't make it any harder to fix cases such as this.
> 

I agree this patch shouldn't make it harder to improve these cases, but
TBH I don't quite see which part of the patch would do that? Which bit
are you objecting to? If we decide to change how match_clause_to_index()
deals with these cases, to recognize them as index quals, the patch will
be working just fine.

The only thing the patch does is it looks at clauses we decided not to
treat as index quals, and do maybe still evaluate them on index. And I
don't think I want to move these goalposts much further.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Tue, Aug 8, 2023 at 11:14 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I agree this patch shouldn't make it harder to improve these cases, but
> TBH I don't quite see which part of the patch would do that? Which bit
> are you objecting to? If we decide to change how match_clause_to_index()
> deals with these cases, to recognize them as index quals, the patch will
> be working just fine.

Well, I also recently said that I think that it's important that you
figure out a way to reliably avoid visibility checks, in cases where
it can be avoided entirely -- since that can lead to huge savings in
heap accesses. You haven't done that yet, but you really should look
into it IMV.

Assuming that that happens, then it immediately gives index scans a
huge advantage over bitmap index scans. At that point it seems
important to describe (in high level terms) where it is that the
advantage is innate, and where it's just because we haven't done the
required work for bitmap index scans. I became confused on this point
myself yesterday. Admittedly I should have been able to figure it out
on my own -- but it is confusing.

> The only thing the patch does is it looks at clauses we decided not to
> treat as index quals, and do maybe still evaluate them on index. And I
> don't think I want to move these goalposts much further.

Avoiding the need for visibility checks completely (in at least a
subset of cases) was originally your idea. I'm not going to insist on
it, or anything like that. It just seems like something that'll add a
great deal of value over what you have already.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Assuming that that happens, then it immediately gives index scans a
> huge advantage over bitmap index scans. At that point it seems
> important to describe (in high level terms) where it is that the
> advantage is innate, and where it's just because we haven't done the
> required work for bitmap index scans. I became confused on this point
> myself yesterday. Admittedly I should have been able to figure it out
> on my own -- but it is confusing.

I also have some doubts about the costing. That contributed to my confusion.

Take my " four = 1 and two != 1" example query, from earlier today. As
I said, that gets a bitmap index scan, which does a hugely excessive
amount of heap access. But once I force the planner to use an index
scan, then (as predicted) there are useful index filters -- filters
that can eliminate 100% of all heap accesses. Yet the planner still
thinks that the total cost of the bitmap scan plan is only 415.28,
versus 714.89 for the index scan plan. Perhaps that's just because
this is a tricky case, for whatever reason...but it's not obvious what
that reason really is.

You keep pointing out that your patch only makes isolated, local
changes to certain specific plans. While that is true, it's also true
that there will be fairly far removed consequences. Why shouldn't I
treat those things as in scope?

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/8/23 20:36, Peter Geoghegan wrote:
> On Tue, Aug 8, 2023 at 11:14 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> I agree this patch shouldn't make it harder to improve these cases, but
>> TBH I don't quite see which part of the patch would do that? Which bit
>> are you objecting to? If we decide to change how match_clause_to_index()
>> deals with these cases, to recognize them as index quals, the patch will
>> be working just fine.
> 
> Well, I also recently said that I think that it's important that you
> figure out a way to reliably avoid visibility checks, in cases where
> it can be avoided entirely -- since that can lead to huge savings in
> heap accesses. You haven't done that yet, but you really should look
> into it IMV.
> 
> Assuming that that happens, then it immediately gives index scans a
> huge advantage over bitmap index scans. At that point it seems
> important to describe (in high level terms) where it is that the
> advantage is innate, and where it's just because we haven't done the
> required work for bitmap index scans. I became confused on this point
> myself yesterday. Admittedly I should have been able to figure it out
> on my own -- but it is confusing.
> 

Yeah, I agree that might help a lot, particularly for tables that have a
significant fraction of not-all-visible pages.

>> The only thing the patch does is it looks at clauses we decided not to
>> treat as index quals, and do maybe still evaluate them on index. And I
>> don't think I want to move these goalposts much further.
> 
> Avoiding the need for visibility checks completely (in at least a
> subset of cases) was originally your idea. I'm not going to insist on
> it, or anything like that. It just seems like something that'll add a
> great deal of value over what you have already.
> 

Right, and I'm not against improving that, but I see it more like an
independent task. I don't think it needs (or should) to be part of this
patch - skipping visibility checks would apply to IOS, while this is
aimed only at plain index scans.

Furthermore, I don't have a very good idea how to do that (except maybe
for relying on the leakproof flag).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/8/23 21:15, Peter Geoghegan wrote:
> On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
>> Assuming that that happens, then it immediately gives index scans a
>> huge advantage over bitmap index scans. At that point it seems
>> important to describe (in high level terms) where it is that the
>> advantage is innate, and where it's just because we haven't done the
>> required work for bitmap index scans. I became confused on this point
>> myself yesterday. Admittedly I should have been able to figure it out
>> on my own -- but it is confusing.
> 
> I also have some doubts about the costing. That contributed to my confusion.
> 
> Take my " four = 1 and two != 1" example query, from earlier today. As
> I said, that gets a bitmap index scan, which does a hugely excessive
> amount of heap access. But once I force the planner to use an index
> scan, then (as predicted) there are useful index filters -- filters
> that can eliminate 100% of all heap accesses. Yet the planner still
> thinks that the total cost of the bitmap scan plan is only 415.28,
> versus 714.89 for the index scan plan. Perhaps that's just because
> this is a tricky case, for whatever reason...but it's not obvious what
> that reason really is.
> 

Right. I haven't checked how the costs are calculated in these cases,
but I'd bet it's a combination of having correlated conditions, and the
bitmap costing being fairly rough (with plenty of constants etc).

The correlation seems like an obvious culprit, considering the explain says

 Bitmap Heap Scan on public.tenk1  (cost=31.35..413.85 rows=1250
width=250) (actual time=2.698..2.703 rows=0 loops=1)

So we expect 1250 rows. If that was accurate, the index scan would have
to do 1250 heap fetches. It's just luck the index scan doesn't need to
do that. I don't this there's a chance to improve this costing - if the
inputs are this off, it can't do anything.

Also, I think this is related to the earlier discussion about maybe
costing it according to the worst case - i.e. as if we still needed
fetch the same number of heap tuples as before. Which will inevitably
lead to similar issues, with worse plans looking cheaper.

> You keep pointing out that your patch only makes isolated, local
> changes to certain specific plans. While that is true, it's also true
> that there will be fairly far removed consequences. Why shouldn't I
> treat those things as in scope?
> 

That is certainly true - I'm trying to keep the scope somewhat close to
the original goal. Obviously, there may be additional things the patch
really needs to consider, but I'm not sure this is one of those cases
(perhaps I just don't understand what the issue is - the example seems
like a run-of-the-mill case of poor estimate / costing).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Tue, Aug 8, 2023 at 1:24 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> > Assuming that that happens, then it immediately gives index scans a
> > huge advantage over bitmap index scans. At that point it seems
> > important to describe (in high level terms) where it is that the
> > advantage is innate, and where it's just because we haven't done the
> > required work for bitmap index scans. I became confused on this point
> > myself yesterday. Admittedly I should have been able to figure it out
> > on my own -- but it is confusing.
> >
>
> Yeah, I agree that might help a lot, particularly for tables that have a
> significant fraction of not-all-visible pages.

It also has the potential to make the costing a lot easier in certain
important cases. Accurately deriving just how many heap accesses can
be avoided via the VM from the statistics that are available to the
planner is likely always going to be very difficult. Finding a way to
make that just not matter at all (in these important cases) can also
make it safe to bias the costing, such that the planner tends to favor
index scans (and index-only scans) over bitmap index scans that cannot
possibly eliminate any heap page accesses via an index filter qual.

> Right, and I'm not against improving that, but I see it more like an
> independent task. I don't think it needs (or should) to be part of this
> patch - skipping visibility checks would apply to IOS, while this is
> aimed only at plain index scans.

I'm certainly not going to insist on it. Worth considering if putting
it in scope could make certain aspects of this patch (like the
costing) easier, though.

I think that it wouldn't be terribly difficult to make simple
inequalities into true index quals. I think I'd like to have a go at
it myself. To some degree I'm trying to get a sense of how much that'd
help you.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Tue, Aug 8, 2023 at 1:49 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> So we expect 1250 rows. If that was accurate, the index scan would have
> to do 1250 heap fetches. It's just luck the index scan doesn't need to
> do that. I don't this there's a chance to improve this costing - if the
> inputs are this off, it can't do anything.

Well, that depends. If we can find a way to make the bitmap index scan
capable of doing something like the same trick through other means, in
some other patch, then this particular problem (involving a simple
inequality) just goes away. There may be other cases that look a
little similar, with a more complicated expression, where it just
isn't reasonable to expect a bitmap index scan to compete. Ideally,
bitmap index scans will only be at a huge disadvantage when it just
makes sense, due to the particulars of the expression.

I'm not trying to make this your problem. I'm just trying to establish
the general nature of the problem.

> Also, I think this is related to the earlier discussion about maybe
> costing it according to the worst case - i.e. as if we still needed
> fetch the same number of heap tuples as before. Which will inevitably
> lead to similar issues, with worse plans looking cheaper.

Not in those cases where it just doesn't come up, because we can
totally avoid visibility checks. As I said, securing that guarantee
has the potential to make the costing a lot more reliable/easier to
implement.

> That is certainly true - I'm trying to keep the scope somewhat close to
> the original goal. Obviously, there may be additional things the patch
> really needs to consider, but I'm not sure this is one of those cases
> (perhaps I just don't understand what the issue is - the example seems
> like a run-of-the-mill case of poor estimate / costing).

I'm not trying to impose any particular interpretation here. It's
early in the cycle, and my questions are mostly exploratory. I'm still
trying to develop my own understanding of the trade-offs in this area.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:

On 8/8/23 23:03, Peter Geoghegan wrote:
> On Tue, Aug 8, 2023 at 1:49 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> So we expect 1250 rows. If that was accurate, the index scan would have
>> to do 1250 heap fetches. It's just luck the index scan doesn't need to
>> do that. I don't this there's a chance to improve this costing - if the
>> inputs are this off, it can't do anything.
> 
> Well, that depends. If we can find a way to make the bitmap index scan
> capable of doing something like the same trick through other means, in
> some other patch, then this particular problem (involving a simple
> inequality) just goes away. There may be other cases that look a
> little similar, with a more complicated expression, where it just
> isn't reasonable to expect a bitmap index scan to compete. Ideally,
> bitmap index scans will only be at a huge disadvantage when it just
> makes sense, due to the particulars of the expression.
> 
> I'm not trying to make this your problem. I'm just trying to establish
> the general nature of the problem.
> 
>> Also, I think this is related to the earlier discussion about maybe
>> costing it according to the worst case - i.e. as if we still needed
>> fetch the same number of heap tuples as before. Which will inevitably
>> lead to similar issues, with worse plans looking cheaper.
> 
> Not in those cases where it just doesn't come up, because we can
> totally avoid visibility checks. As I said, securing that guarantee
> has the potential to make the costing a lot more reliable/easier to
> implement.
> 

But in the example you shared yesterday, the problem is not really about
visibility checks. In fact, the index scan costing completely ignores
the VM checks - it didn't matter before, and the patch did not change
this. It's about the number of rows the index scan is expected to
produce - and those will always do a random I/O, we can't skip those.

>> That is certainly true - I'm trying to keep the scope somewhat close to
>> the original goal. Obviously, there may be additional things the patch
>> really needs to consider, but I'm not sure this is one of those cases
>> (perhaps I just don't understand what the issue is - the example seems
>> like a run-of-the-mill case of poor estimate / costing).
> 
> I'm not trying to impose any particular interpretation here. It's
> early in the cycle, and my questions are mostly exploratory. I'm still
> trying to develop my own understanding of the trade-offs in this area.
> 

Understood. I think this whole discussion is about figuring out these
trade offs and also how to divide the various improvements into "minimum
viable" changes.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Wed, Aug 9, 2023 at 9:05 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> But in the example you shared yesterday, the problem is not really about
> visibility checks. In fact, the index scan costing completely ignores
> the VM checks - it didn't matter before, and the patch did not change
> this. It's about the number of rows the index scan is expected to
> produce - and those will always do a random I/O, we can't skip those.

I wasn't really talking about that example here.

As I see it, the problem from my example is that plain index scans had
an "unnatural" advantage over bitmap index scans. There was no actual
reason why the system couldn't just deal with the inequality on the
second column uniformly, so that index scans and bitmap index scans
both filtered out all non-matches inexpensively, without heap access.
Then the costing could have been quite off, and it really wouldn't
have mattered at runtime, because the index scan and bitmap index scan
would do approximately the same thing in any case.

As I've said, it's obviously also true that there are many other cases
where there really will be a "natural" advantage for index scans, that
bitmap index scans just cannot hope to offer. These are the cases
where the mechanism from your patch is best placed to be the thing
that avoids heap accesses, or maybe even avoid all visibility checks
(despite not using true index quals).

> Understood. I think this whole discussion is about figuring out these
> trade offs and also how to divide the various improvements into "minimum
> viable" changes.

That's exactly how I see it myself.

Obviously, there is still plenty of gray area here -- cases where it's
not at all clear whether or not we should rely on the mechanism from
your patch, or whether we should provide some alternative, more
specialized mechanism. For example, I've made a lot out of simple !=
inequalities recently, but it's natural to wonder what that might mean
for NOT IN ( ... ) SAOP inequalities. Am I also going to add
specialized code that passes those down to the index AM? Where do you
draw the line?

I totally accept that there is a significant amount of gray area, and
that that's likely to remain true for the foreseeable future. But I
also believe that there is a small but important number of things that
are either exactly black or exactly white. If we can actually firmly
agree on what these other things are in days or weeks (which seems
quite doable), then we'll have the right framework for figuring
everything else out over time (possibly over multiple releases). We'll
at least have the right shared vocabulary for discussing the problems,
which is a very good start. I want to have a general structure that
has the right general concepts in place from the start -- that's all.

I also suspect that we'll discover that the large amount of gray area
clauses/items are those that tend to be far less important than
"exactly black" and "exactly white" items. So even if we can only
agree that a small handful of things are in either category, that
small handful will likely be very overrepresented in real world
queries. For example, simple inequalities are very common -- it's
surprising that nbtree can't already handle them directly. I should
have thought of this myself, long ago, but it took your patch to force
me to think about it.

The problem with simple inequalities was "hiding in plain sight" for a
very long time. Could there be anything else like that?

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/8/23 22:54, Peter Geoghegan wrote:
> On Tue, Aug 8, 2023 at 1:24 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>> Assuming that that happens, then it immediately gives index scans a
>>> huge advantage over bitmap index scans. At that point it seems
>>> important to describe (in high level terms) where it is that the
>>> advantage is innate, and where it's just because we haven't done the
>>> required work for bitmap index scans. I became confused on this point
>>> myself yesterday. Admittedly I should have been able to figure it out
>>> on my own -- but it is confusing.
>>>
>>
>> Yeah, I agree that might help a lot, particularly for tables that have a
>> significant fraction of not-all-visible pages.
> 
> It also has the potential to make the costing a lot easier in certain
> important cases. Accurately deriving just how many heap accesses can
> be avoided via the VM from the statistics that are available to the
> planner is likely always going to be very difficult. Finding a way to
> make that just not matter at all (in these important cases) can also
> make it safe to bias the costing, such that the planner tends to favor
> index scans (and index-only scans) over bitmap index scans that cannot
> possibly eliminate any heap page accesses via an index filter qual.
> 

Yes, if there's a way to safely skip the visibility check for some
conditions, that would probably make the costing simpler.

Anyway, I find this discussion rather abstract and I'll probably forget
half the important cases by next week. Maybe it'd be good to build a set
of examples demonstrating the interesting cases? We've already used a
couple tenk1 queries for that purpose ...

>> Right, and I'm not against improving that, but I see it more like an
>> independent task. I don't think it needs (or should) to be part of this
>> patch - skipping visibility checks would apply to IOS, while this is
>> aimed only at plain index scans.
> 
> I'm certainly not going to insist on it. Worth considering if putting
> it in scope could make certain aspects of this patch (like the
> costing) easier, though.
> 
> I think that it wouldn't be terribly difficult to make simple
> inequalities into true index quals. I think I'd like to have a go at
> it myself. To some degree I'm trying to get a sense of how much that'd
> help you.
> 

I'm trying to make the patch to not dependent on such change. In a way,
once a clause gets recognized as index qual, it becomes irrelevant for
my patch. But the patch also doesn't get any simpler, because it still
needs to do the same thing for the remaining quals.

OTOH if there was some facility to decide if a qual is "safe" to be
executed on the index tuple, that'd be nice. But as I already said, I
see it more as an additional optimization, as it only applies to a
subset of cases.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Wed, Aug 9, 2023 at 10:00 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Anyway, I find this discussion rather abstract and I'll probably forget
> half the important cases by next week. Maybe it'd be good to build a set
> of examples demonstrating the interesting cases? We've already used a
> couple tenk1 queries for that purpose ...

That seems wise. I'll try to come up with a list of general principles
with specific and easy to run examples later on today.

> I'm trying to make the patch to not dependent on such change. In a way,
> once a clause gets recognized as index qual, it becomes irrelevant for
> my patch. But the patch also doesn't get any simpler, because it still
> needs to do the same thing for the remaining quals.

Practically speaking, I don't see any reason why you won't be able to
sign off on the set of principles that I'll lay out for work in this
area, while at the same time continuing with this patch more or less
as originally planned.

At one point I thought that your patch might be obligated to
compensate for its tendency to push down OR-heavy clauses as
expressions, leading to "risky" plans. While I still have a concern
about that now, I'm not going to try to make it your problem. I'm now
inclined to think of this as an existing problem, that your patch will
increase the prevalence of -- but not to the extent that it makes
sense to hold it up. To some extent it is up to me to put my money
where my mouth is.

I'm optimistic about the prospect of significantly ameliorating (if
not fixing) the "risky OR expression plan" problem in the scope of my
work on 17. But even if that doesn't quite happen (say we don't end up
normalizing to CNF in the way that I've suggested for 17), we should
at least reach agreement on the best way forward. If we could just
agree that evaluating complicated OR expressions in index filters is
much worse than finding a way to pass them down as an equivalent index
qual (an SAOP), then I could live with it for another release or two.

As I said, I mostly just care about having the right general
principles in place at this point.

> OTOH if there was some facility to decide if a qual is "safe" to be
> executed on the index tuple, that'd be nice. But as I already said, I
> see it more as an additional optimization, as it only applies to a
> subset of cases.

I'm happy to go with that.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/9/23 19:53, Peter Geoghegan wrote:
> On Wed, Aug 9, 2023 at 10:00 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Anyway, I find this discussion rather abstract and I'll probably forget
>> half the important cases by next week. Maybe it'd be good to build a set
>> of examples demonstrating the interesting cases? We've already used a
>> couple tenk1 queries for that purpose ...
> 
> That seems wise. I'll try to come up with a list of general principles
> with specific and easy to run examples later on today.
> 

Cool. I'll try to build my own set of examples that I find interesting
either because it's what the patch aims to help with, or because I
expect it to be problematic for some reason. And then we can compare.

>> I'm trying to make the patch to not dependent on such change. In a way,
>> once a clause gets recognized as index qual, it becomes irrelevant for
>> my patch. But the patch also doesn't get any simpler, because it still
>> needs to do the same thing for the remaining quals.
> 
> Practically speaking, I don't see any reason why you won't be able to
> sign off on the set of principles that I'll lay out for work in this
> area, while at the same time continuing with this patch more or less
> as originally planned.
> 
> At one point I thought that your patch might be obligated to
> compensate for its tendency to push down OR-heavy clauses as
> expressions, leading to "risky" plans. While I still have a concern
> about that now, I'm not going to try to make it your problem. I'm now
> inclined to think of this as an existing problem, that your patch will
> increase the prevalence of -- but not to the extent that it makes
> sense to hold it up. To some extent it is up to me to put my money
> where my mouth is.
> 
> I'm optimistic about the prospect of significantly ameliorating (if
> not fixing) the "risky OR expression plan" problem in the scope of my
> work on 17. But even if that doesn't quite happen (say we don't end up
> normalizing to CNF in the way that I've suggested for 17), we should
> at least reach agreement on the best way forward. If we could just
> agree that evaluating complicated OR expressions in index filters is
> much worse than finding a way to pass them down as an equivalent index
> qual (an SAOP), then I could live with it for another release or two.
> 

Yup, I agree with that principle. The AM can evaluate the expression in
a smarter way, without the visibility checks.

> As I said, I mostly just care about having the right general
> principles in place at this point.
> 
>> OTOH if there was some facility to decide if a qual is "safe" to be
>> executed on the index tuple, that'd be nice. But as I already said, I
>> see it more as an additional optimization, as it only applies to a
>> subset of cases.
> 
> I'm happy to go with that.
> 


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Wed, Aug 9, 2023 at 11:15 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Cool. I'll try to build my own set of examples that I find interesting
> either because it's what the patch aims to help with, or because I
> expect it to be problematic for some reason. And then we can compare.

That would be great. I definitely want to make this a collaborative thing.

> Yup, I agree with that principle. The AM can evaluate the expression in
> a smarter way, without the visibility checks.

Attached text document (which I guess might be my first draft) is an
attempt to put the discussion up until this point on a more formal
footing.

The format here tries to reduce the principles to a handful of bullet
points. For example, one line reads:

+ Index quals are better than equivalent index filters because bitmap
index scans can only use index quals

I'm pretty sure that these are all points that you and I both agree on
already. But you should confirm that. And make your own revisions, as
you see fit.

It's definitely possible that I overlooked an interesting and durable
advantage that index filters have. If there is some other way in which
index filters are likely to remain the best and only viable approach,
then we should note that. I just went with the really obvious case of
an expression that definitely needs visibility checks to avoid ever
throwing a division-by-zero error, related to some invisible tuple.
It's an extreme case in that it focuses on requirements that seem just
about unavoidable in any future world. (Come to think of it, the
biggest and most durable advantage for index filters is probably just
how general they are, which I do mention.)

--
Peter Geoghegan

Вложения

Re: Use of additional index columns in rows filtering

От
Jeff Davis
Дата:
On Wed, 2023-08-09 at 17:14 -0700, Peter Geoghegan wrote:
> + Index quals are better than equivalent index filters because bitmap
> index scans can only use index quals

It seems there's consensus that:

  * Index Filters (Tomas's patch and the topic of this thread) are more
general, because they can work on any expression, e.g. 1/x, which can
throw an error if x=0.
  * Index quals are more optimizable, because such operators are not
supposed to throw errors or have side effects, so we can evaluate them
before determining visibility.

I wouldn't describe one as "better" than the other, but I assume you
meant "more optimizable".

It's interesting that there's overlap in utility between Tomas's
current patch and Peter's work on optimizing SAOPs. But I don't see a
lot of tension there -- it seems like Tomas's patch will always be
useful for filters that might throw an error (or have other side
effects).

Is there any part of the design here that's preventing this patch from
moving forward? If not, what are the TODOs for the current patch, or is
it just waiting on more review?

Regards,
    Jeff Davis




Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Thu, Aug 17, 2023 at 4:29 PM Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2023-08-09 at 17:14 -0700, Peter Geoghegan wrote:
> > + Index quals are better than equivalent index filters because bitmap
> > index scans can only use index quals
>
> It seems there's consensus that:
>
>   * Index Filters (Tomas's patch and the topic of this thread) are more
> general, because they can work on any expression, e.g. 1/x, which can
> throw an error if x=0.

Agreed, but with one small caveat: they're not more general when it
comes to bitmap index scans, which seem like they'll only ever be able
to support index quals. But AFAICT that's the one and only exception.

>   * Index quals are more optimizable, because such operators are not
> supposed to throw errors or have side effects, so we can evaluate them
> before determining visibility.

Right. Though there is a second reason: the index AM can use them to
navigate the index more efficiently with true index quals. At least in
the case of SAOPs, and perhaps in other cases in the future.

> I wouldn't describe one as "better" than the other, but I assume you
> meant "more optimizable".

The use of the term "better" is actually descriptive of Tomas' patch.
It is structured in a way that conditions the use of index filters on
the absence of equivalent index quals for eligible/indexed clauses. So
it really does make sense to think of it as a "qual hierarchy" (to use
Tomas' term).

It's also true that it will always be fundamentally impossible to use
index quals for a significant proportion of all clause types, so
"better when feasible at all" is slightly more accurate.

> Is there any part of the design here that's preventing this patch from
> moving forward? If not, what are the TODOs for the current patch, or is
> it just waiting on more review?

Practically speaking, I have no objections to moving ahead with this.
I believe that the general structure of the patch makes sense, and I
don't expect Tomas to do anything that wasn't already expected before
I chimed in.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > The only thing the patch does is it looks at clauses we decided not to
> > treat as index quals, and do maybe still evaluate them on index. And I
> > don't think I want to move these goalposts much further.
>
> Avoiding the need for visibility checks completely (in at least a
> subset of cases) was originally your idea. I'm not going to insist on
> it, or anything like that. It just seems like something that'll add a
> great deal of value over what you have already.

Another idea in this area occurred to me today: it would be cool if
non-key columns from INCLUDE indexes could completely avoid visibility
checks (not just avoid heap accesses using the visibility map) in
roughly the same way that we'd expect with an equivalent key column
already, today (if it was an index filter index qual). Offhand I think
that it would make sense to do that outside of index AMs, by extending
the mechanism from Tomas' patch to this special class of expression.
We'd invent some other class of index filter that "outranks"
conventional index filters when the optimizer can safely determine
that they're "index filters with the visibility characteristics of
true index quals". I am mostly thinking of simple, common cases here
(e.g., an equality operator + constant).

This might require the involvement of the relevant btree opclass,
since that's where the no-visibility-check-safety property actually
comes from. However, it wouldn't need to be limited to INCLUDE B-Tree
indexes. You could for example do this with a GiST INCLUDE index that
had no opclass information about the datatype/operator. That'd be a
natural advantage of an approach based on index filters.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Peter Geoghegan
Дата:
On Mon, Aug 7, 2023 at 12:34 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 8/7/23 02:38, Peter Geoghegan wrote:
> > This plan switchover isn't surprising in itself -- it's one of the most
> > important issues addressed by my SAOP patch. However, it *is* a little
> > surprising that your patch doesn't even manage to use "Index Filter"
> > quals. It appears that it is only capable of using table filter quals.
> > Obviously, the index has all the information that expression
> > evaluation needs, and yet I see "Filter: (tenk1.tenthous = ANY
> > ('{1,3,42,43,44,45,46,47,48,49,50}'::integer[]))". So no improvement
> > over master here.

> Right. This happens because the matching of SAOP to indexes happens in
> multiple places. Firstly, create_index_paths() matches the clauses to
> the index by calling
>
>     match_restriction_clauses_to_index
>      -> match_clauses_to_index
>          -> match_clause_to_index
>
> Which is where we also decide which *unmatched* clauses can be filters.
> And this *does* match the SAOP to the index key, hence no index filter.
>
> But then we call get_index_paths/build_index_path a little bit later,
> and that decides to skip "lower SAOP" (which seems a bit strange,
> because the column is "after" the equality, but meh). Anyway, at this
> point we already decided what's a filter, ignoring the index clauses,
> and not expecting any backsies.
>
> The simples fix seems to be to add these skipped SAOP clauses as
> filters. We know it can be evaluated on the index ...

Update on this: I recently posted v2 of my patch, which completely
removes build_index_paths's "skip_lower_saop" mechanism. This became
possible in v2 because it fully eliminated all of the advantages that
SOAP style filter quals might have had over true index quals, through
further enhancements on the nbtree side. There is simply no reason to
generate alternative index paths with filter quals in the first place.
(As I seem to like to say, "choice is confusion".)

In short, v2 of my patch fully adheres to the principles set out in
the "qual hierarchy" doc. The planner no longer needs to know anything
about how nbtree executes SAOP index quals, except when costing them.
To the planner, there is pretty much no difference between "=" and "=
ANY()" (for index AMs that natively support SAOP execution).

I imagine that this general planner structure will be ideal for your
patch. If I'm not mistaken, it will allow you to completely avoid
treating SAOPs as a special case. Although the build_index_paths
"skip_lower_saop" thing might have created issues for the approach
your patch takes in the planner, that seems to me to work best as an
argument against the "skip_lower_saop" mechanism -- it was always a
kludge IMV.

--
Peter Geoghegan



Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
Hi,

It took me a while but I finally got back to working on this patch. I
reread the summary of principles Peter shared in August, and I do agree
with all the main points - the overall qual hierarchy and the various
examples and counter-examples.

I'll respond to a couple things in this sub-thread, and then I plan to
post an updated version of the patch with a discussion of a couple
problems I still haven't managed to solve (not necessarily new ones).

On 8/19/23 00:19, Peter Geoghegan wrote:
> On Thu, Aug 17, 2023 at 4:29 PM Jeff Davis <pgsql@j-davis.com> wrote:
>> On Wed, 2023-08-09 at 17:14 -0700, Peter Geoghegan wrote:
>>> + Index quals are better than equivalent index filters because bitmap
>>> index scans can only use index quals
>>
>> It seems there's consensus that:
>>
>>   * Index Filters (Tomas's patch and the topic of this thread) are more
>> general, because they can work on any expression, e.g. 1/x, which can
>> throw an error if x=0.
> 
> Agreed, but with one small caveat: they're not more general when it
> comes to bitmap index scans, which seem like they'll only ever be able
> to support index quals. But AFAICT that's the one and only exception.
> 

Yeah. Some conditions are never gonna be translated into proper index
quals, either because it's not really possible or because no one just
implemented that.

FWIW it's not immediately obvious to me if bitmap index scans are unable
to support index filters because of some inherent limitation, or whether
it's something we could implement. Some index AMs simply can't support
index filters, because they don't know the "full" index tuple tuple
(e.g. GIN), but maybe other AMs could support that ...

>>   * Index quals are more optimizable, because such operators are not
>> supposed to throw errors or have side effects, so we can evaluate them
>> before determining visibility.
> 
> Right. Though there is a second reason: the index AM can use them to
> navigate the index more efficiently with true index quals. At least in
> the case of SAOPs, and perhaps in other cases in the future.
> 
>> I wouldn't describe one as "better" than the other, but I assume you
>> meant "more optimizable".
> 
> The use of the term "better" is actually descriptive of Tomas' patch.
> It is structured in a way that conditions the use of index filters on
> the absence of equivalent index quals for eligible/indexed clauses. So
> it really does make sense to think of it as a "qual hierarchy" (to use
> Tomas' term).
> 
> It's also true that it will always be fundamentally impossible to use
> index quals for a significant proportion of all clause types, so
> "better when feasible at all" is slightly more accurate.
> 

Yeah, I agree with all of this too. I think that in all practical cases
an index qual is strictly better than an index filter, for exactly these
reasons.

>> Is there any part of the design here that's preventing this patch from
>> moving forward? If not, what are the TODOs for the current patch, or is
>> it just waiting on more review?
> 
> Practically speaking, I have no objections to moving ahead with this.
> I believe that the general structure of the patch makes sense, and I
> don't expect Tomas to do anything that wasn't already expected before
> I chimed in.
> 

I agree the general idea is sound. The main "problem" in the original
patch was about costing changes and the implied risk of picking the
wrong plan (instead of e.g. a bitmap index scan). That's pretty much
what the "risk" in example (4) in the principles.txt is about, AFAICS.

I think the consensus is that if we don't change the cost, this issue
mostly goes away - we won't pick index scan in cases where we'd pick a
different plan without the patch, and for index scans it's (almost)
always correct to replace "table filter" with "index filter".

If that issue goes away, I think there are two smaller ones - picking
which conditions to promote to index filters, and actually tweaking the
index scan code to do that.

The first issue seems similar to the costing issue - in fact, it really
is a costing problem. The goal is to minimize the amount of work needed
to evaluate the conditions on all rows, and if we knew selectivity+cost
for each condition, it'd be a fairly simple matter. But in practice we
don't know this - the selectivity estimates are rather shaky (and doubly
so for correlated conditions), and the procedure cost estimates are
quite crude too ...

The risk is we might move "forward" very expensive condition. Imagine a
condition with procost=1000000, which would normally be evaluated last
after all other clauses. But if we promote it to an index filter, that'd
no longer be true.

What Jeff proposed last week was roughly this:

- find min(procost) for clauses that can't be index filters
- consider all clauses with procost <= min(procost) as index filters

But after looking at this I realized order_qual_clauses() does a very
similar thing for regular clauses, and the proposed logic contradicts
the heuristics order_qual_clauses() a bit.

In particular, order_qual_clauses() orders the clauses by procost, but
then it's careful not to reorder the clauses with the same procost. With
the proposed heuristics, that'd change - the clauses might get reordered
in various ways, possibly with procost values [1, 10, 100, 1, 10, 100]
or something like that.

Not sure if we want to relax the heuristics like this. There's also the
practical issue that order_qual_clauses() gets called quite late, long
after the index filters were built. And it also considers security
levels, which I ignored until now.

But now that I think about it, with the original patch it had to be
decided when building the path, because that's when the costing happens.
With the costing changes removed, we can do probably do that much later
while creating the plan, after order_qual_clauses() does all the work.

We could walk the qpqual list, and stop on the first element that can't
be treated as index filter. That'd also handle the security levels, and
it'd *almost* do what Jeff proposed, I think.

The main difference is that it'd not reorder clauses with the same
procost. With multiple clauses with the same procost, it'd depend on
which one happens to be first in the list, which AFAIK depends on the
order of conditions in the query. That may be a bit surprising, I guess,
because it seems natural that in a declarative language this should not
really matter. Yes, we already do that, but it probably does not have
huge impact. If it affects whether a condition will be treated as an
index filter, the differences are likely to be much more significant.

(FWIW this reminds me the issues we had with GROUP BY optimization,
which ultimately got reverted because the reordering was considered too
risky. I'm afraid we might end in the same situation ...)


As for the other issue, that's more about how to deal with the various
quals in the generic scan code. Imagine we have multiple conditions, and
and some of them can be treated as index filters. So for example we may
end up with this:

  quals = [A, B, C, D]
  index-filters = [A, B]

And now a tuple can fall into three categories:

  a) all-visible=false, i.e. can't use index filter => quals

  b) all-visible=true, and index-filters evaluates to false

  c) all-visible=true, and index-filters evaluates to true

The first two cases are fine, but for (c) we need to also evaluate the
remaining non-filter quals [C, D]. We may build such list, and the 0002
patch does that, and shows the list in explain. But where/how would we
evaluate it? The code in execScan.c uses the ScanState->ps.qual, which
is set to [A,B,C,D]. Which is correct, but it does more work than
strictly necessary - the index filters are evaluated again. We can't
easily change that to [C,D] -> that would break the (a) case. Which
quals are evaluated on the heap tuple depends on visibility map and on
the index-filter result.

I think there's two options. First, we may accept that some of index
filters may get evaluated twice. That's not great, because it increases
the risk of regressions. Second, we could disable ScanState->ps.quals if
there are index filters, and just do all the work info nodeIndexscan.

But that seems quite ugly - in a way, the code already does that, which
is where the two loops

    while (true)
    {
        for (;;)
        {
            ...
        }
    }

come from. I was hoping to simplify / get rid of this, not making it do
even more stuff ... :-(


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: Use of additional index columns in rows filtering

От
Tomas Vondra
Дата:
On 8/19/23 02:49, Peter Geoghegan wrote:
> On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
>>> The only thing the patch does is it looks at clauses we decided not to
>>> treat as index quals, and do maybe still evaluate them on index. And I
>>> don't think I want to move these goalposts much further.
>>
>> Avoiding the need for visibility checks completely (in at least a
>> subset of cases) was originally your idea. I'm not going to insist on
>> it, or anything like that. It just seems like something that'll add a
>> great deal of value over what you have already.
> 
> Another idea in this area occurred to me today: it would be cool if
> non-key columns from INCLUDE indexes could completely avoid visibility
> checks (not just avoid heap accesses using the visibility map) in
> roughly the same way that we'd expect with an equivalent key column
> already, today (if it was an index filter index qual). Offhand I think
> that it would make sense to do that outside of index AMs, by extending
> the mechanism from Tomas' patch to this special class of expression.
> We'd invent some other class of index filter that "outranks"
> conventional index filters when the optimizer can safely determine
> that they're "index filters with the visibility characteristics of
> true index quals". I am mostly thinking of simple, common cases here
> (e.g., an equality operator + constant).
> 
> This might require the involvement of the relevant btree opclass,
> since that's where the no-visibility-check-safety property actually
> comes from. However, it wouldn't need to be limited to INCLUDE B-Tree
> indexes. You could for example do this with a GiST INCLUDE index that
> had no opclass information about the datatype/operator. That'd be a
> natural advantage of an approach based on index filters.
> 

I haven't really thought about INCLUDE columns at all. I agree it seems
doable and possibly quite useful, and doing it outside the index AM
seems about right. The index does not know about the opclass for these
columns, it just stores them, why/how should it be responsible to do
something with it? And as you said, it seems like a (fairly natural?)
extension of the patch discussed in this thread.

That being said, I don't intend to make this work in v1. Once the other
issues get solved in some way, I may try hacking a WIP, but mostly to
try if there's not some design issue that'd make it hard in the future.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company