Обсуждение: Pushing ScalarArrayOpExpr support into the btree index AM

Поиск
Список
Период
Сортировка

Pushing ScalarArrayOpExpr support into the btree index AM

От
Tom Lane
Дата:
Almost immediately after we committed index-only scans, there were
complaints that it didn't work with "indexkey IN (...)" conditions, that
is ScalarArrayOpExpr index quals.  That's because the current code only
supports ScalarArrayOpExpr as a bitmap indexscan qual, not a regular
indexscan.  The executor performs a separate indexscan for each element of
the array, relying on the bitmap mechanism to eliminate duplicate hits on
the same heap tuple.

In principle it's not hard to see how ScalarArrayOpExpr could be supported
as a regular indexqual for btree indexes.  If the comparison operator has
<, <=, >=, or > semantics, then the array condition is equivalent to a
simple scalar condition using the greatest or least array element
respectively.  Otherwise (i.e., the operator is =):

1. Sort the array elements into the same order as the btree, eliminating
duplicates and NULL entries.  (If there are no non-null elements, the
qual condition is unsatisfiable and we can skip the indexscan.)

2. Perform an indexscan for each remaining array element, in sequence.
Since we eliminated equal values in step 1, there can be no two matches
to the same index entry, so there's no need for duplicate elimination.
What's more, because we sorted the array to match the index order, index
entries will be matched in index order, so the results of the scan can
still be expected to come out in sorted order, a critical expectation for
btree indexscans.

If we have more than one ScalarArrayOpExpr then we have to be a bit
careful about how we perform the multiple indexscans to ensure that output
ordering is preserved, but it's certainly doable.

So, at least as far as btrees are concerned, it seems like I implemented
the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed
down into the index AM.  The above rules seem pretty btree-specific, so
I don't think that we ought to have the main executor doing any of this.
I envision doing this by marking btree as supporting ScalarArrayOpExpr
scankeys directly, so that the executor does nothing special with them,
and the planner treats them the same as regular scalar indexquals.

So that leaves me wondering whether the existing executor-level
implementation of ScalarArrayOpExpr indexquals ought to be ripped out
entirely.  It would not save all that much code in the executor proper
(basically ExecIndexEvalArrayKeys, ExecIndexAdvanceArrayKeys, and a few
lines of supporting logic).  However, there's a fair amount of cruft
in the planner to deal with the concept that ScalarArrayOpExpr is
supported as a bitmap indexscan qual but not a plain indexscan qual.
If that code is only going to get exercised for non-btree index types,
it's likely to be under-tested and hence a continuing source of bugs.

In principle somebody could be doing something likeWHERE pointcol <@ ANY (ARRAY[list of box values])
and expecting that to generate a bitmap indexscan on a GIST index, but
is it likely that anyone is doing that?  (As opposed to the equivalent
formulation with "pointcol <@ box1 OR pointcol <@ box2 ...", which would
still work to generate OR'd bitmap scans even if we took out the
ScalarArrayOpExpr logic.)

Thoughts?
        regards, tom lane


Re: Pushing ScalarArrayOpExpr support into the btree index AM

От
Robert Haas
Дата:
On Sat, Oct 15, 2011 at 2:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In principle somebody could be doing something like
>        WHERE pointcol <@ ANY (ARRAY[list of box values])
> and expecting that to generate a bitmap indexscan on a GIST index, but
> is it likely that anyone is doing that?  (As opposed to the equivalent
> formulation with "pointcol <@ box1 OR pointcol <@ box2 ...", which would
> still work to generate OR'd bitmap scans even if we took out the
> ScalarArrayOpExpr logic.)

That seems like a pretty natural formulation to me, so I would be
rather reluctant to assume nobody's doing it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Pushing ScalarArrayOpExpr support into the btree index AM

От
Noah Misch
Дата:
On Sat, Oct 15, 2011 at 02:58:45PM -0400, Tom Lane wrote:
> [algorithm for a regular index scan satisfying "key IN (...)"]

> So, at least as far as btrees are concerned, it seems like I implemented
> the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed
> down into the index AM.  The above rules seem pretty btree-specific, so
> I don't think that we ought to have the main executor doing any of this.
> I envision doing this by marking btree as supporting ScalarArrayOpExpr
> scankeys directly, so that the executor does nothing special with them,
> and the planner treats them the same as regular scalar indexquals.

Sounds sensible.  The algorithm applies to more than ScalarArrayOpExpr; is it
not the ability to handle an OR'ed list of ScanKey instead of an AND'ed one?
Would it be worth exposing the capability along those lines instead, even if the
only initial consumer is ScalarArrayOpExpr?

> In principle somebody could be doing something like
>     WHERE pointcol <@ ANY (ARRAY[list of box values])
> and expecting that to generate a bitmap indexscan on a GIST index, but
> is it likely that anyone is doing that?  (As opposed to the equivalent
> formulation with "pointcol <@ box1 OR pointcol <@ box2 ...", which would
> still work to generate OR'd bitmap scans even if we took out the
> ScalarArrayOpExpr logic.)

Removing the "key IN (...)" optimization for hash indexes would add one more
barrier to making that access method practical.


Re: Pushing ScalarArrayOpExpr support into the btree index AM

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Sat, Oct 15, 2011 at 02:58:45PM -0400, Tom Lane wrote:
>> [algorithm for a regular index scan satisfying "key IN (...)"]

> Sounds sensible.  The algorithm applies to more than ScalarArrayOpExpr; is it
> not the ability to handle an OR'ed list of ScanKey instead of an AND'ed one?

No, because it's restricted to all the elements having the same operator
and same index column; it's not a generic OR.
        regards, tom lane


Re: Pushing ScalarArrayOpExpr support into the btree index AM

От
Florian Pflug
Дата:
On Oct15, 2011, at 20:58 , Tom Lane wrote:
> So, at least as far as btrees are concerned, it seems like I implemented
> the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed
> down into the index AM.  The above rules seem pretty btree-specific, so
> I don't think that we ought to have the main executor doing any of this.
> I envision doing this by marking btree as supporting ScalarArrayOpExpr
> scankeys directly, so that the executor does nothing special with them,
> and the planner treats them the same as regular scalar indexquals.

Hm, would this make it possible to teach the array GIN ops to also handle
ScalarArrayOpExpr? I've recently had to put
 ARRAY[$1] <@ $2 AND $1 = ANY($2)

into an (inlineable) SQL function to make it use a (btree) index if
$1 is a scalar-values field (and $1 constant array) and a GIN index if $2 
is a GIN-indexed array-values field (and $2 a constant array). Which of
course sucks from an efficiency POV.

At the time I didn't see a way to easily teach GIN to support ANY, but
with your proposal it seems entirely doable. Unless I'm missing something,
that is.

> In principle somebody could be doing something like
>     WHERE pointcol <@ ANY (ARRAY[list of box values])
> and expecting that to generate a bitmap indexscan on a GIST index, but
> is it likely that anyone is doing that?  (As opposed to the equivalent
> formulation with "pointcol <@ box1 OR pointcol <@ box2 ...", which would
> still work to generate OR'd bitmap scans even if we took out the
> ScalarArrayOpExpr logic.)

Hm, if the number of box values isn't fixed, the ANY form plays much nicer
nicer with parametrized statements than the OR form. So I think we shouldn't
take that away from people. 

OTOH it seems that, depending on the actual list of box values, a bitmap
indexscan isn't the smartest way to do this. If the boxes overlap (or are
close enough to each other), using the bounding box to search the index
and then filtering the remaining rows ought to be more efficient.

So maybe we should remove the support for ScalarArrayOpExpr from the main
executor, but add support for it to GIST?

best regards,
Florian Pflug





Re: Pushing ScalarArrayOpExpr support into the btree index AM

От
Tom Lane
Дата:
Florian Pflug <fgp@phlo.org> writes:
> On Oct15, 2011, at 20:58 , Tom Lane wrote:
>> So, at least as far as btrees are concerned, it seems like I implemented
>> the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed
>> down into the index AM.  The above rules seem pretty btree-specific, so
>> I don't think that we ought to have the main executor doing any of this.
>> I envision doing this by marking btree as supporting ScalarArrayOpExpr
>> scankeys directly, so that the executor does nothing special with them,
>> and the planner treats them the same as regular scalar indexquals.

> Hm, would this make it possible to teach the array GIN ops to also handle
> ScalarArrayOpExpr?

Hmm, maybe.  In principle the index AM can always do this at least as
efficiently as the executor can, and maybe there's actual wins to be had
in GIST and GIN.  So another route to getting rid of the executor-level
support would be to implement ScalarArrayOpExpr in all the AMs.  I'm not
personally volunteering to do that though.

> I've recently had to put
>   ARRAY[$1] <@ $2 AND $1 = ANY($2)
> into an (inlineable) SQL function to make it use a (btree) index if
> $1 is a scalar-values field (and $1 constant array) and a GIN index if $2 
> is a GIN-indexed array-values field (and $2 a constant array). Which of
> course sucks from an efficiency POV.

That doesn't seem like the same thing at all, because the indexed column
is on different sides of the expression in the two cases.  The situation
that I'm worried about is "indexedcolumn op ANY(arrayconstant)", and
what you're bringing up is "constant op ANY(indexedarraycolumn)".  To
fit the latter into the existing opclass infrastructure, we'd have to
somehow teach the planner that "constant op ANY(indexedarraycolumn)"
is interchangeable with "indexedarraycolumn @> constant", for pairs of
operators where that's indeed the case.  Seems like that'd be a lot
messier than the use-case warrants.
        regards, tom lane


Re: Pushing ScalarArrayOpExpr support into the btree index AM

От
Florian Pflug
Дата:
On Oct16, 2011, at 19:09 , Tom Lane wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> On Oct15, 2011, at 20:58 , Tom Lane wrote:
>>> So, at least as far as btrees are concerned, it seems like I implemented
>>> the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed
>>> down into the index AM.  The above rules seem pretty btree-specific, so
>>> I don't think that we ought to have the main executor doing any of this.
>>> I envision doing this by marking btree as supporting ScalarArrayOpExpr
>>> scankeys directly, so that the executor does nothing special with them,
>>> and the planner treats them the same as regular scalar indexquals.
> 
>> Hm, would this make it possible to teach the array GIN ops to also handle
>> ScalarArrayOpExpr?
> 
> Hmm, maybe.  In principle the index AM can always do this at least as
> efficiently as the executor can, and maybe there's actual wins to be had
> in GIST and GIN.  So another route to getting rid of the executor-level
> support would be to implement ScalarArrayOpExpr in all the AMs.  I'm not
> personally volunteering to do that though.

Hm, that sounds like we ought to leave the existing infrastructure in
the main executor in place until we have GIN and GIST support.

>> I've recently had to put
>>  ARRAY[$1] <@ $2 AND $1 = ANY($2)
>> into an (inlineable) SQL function to make it use a (btree) index if
>> $1 is a scalar-values field (and $1 constant array) and a GIN index if $2 
>> is a GIN-indexed array-values field (and $2 a constant array). Which of
>> course sucks from an efficiency POV.
> 
> That doesn't seem like the same thing at all, because the indexed column
> is on different sides of the expression in the two cases.  The situation
> that I'm worried about is "indexedcolumn op ANY(arrayconstant)", and
> what you're bringing up is "constant op ANY(indexedarraycolumn)".

Hm, true

> To fit the latter into the existing opclass infrastructure, we'd have to
> somehow teach the planner that "constant op ANY(indexedarraycolumn)"
> is interchangeable with "indexedarraycolumn @> constant", for pairs of
> operators where that's indeed the case.  Seems like that'd be a lot
> messier than the use-case warrants.

That exactly was what convinced me previously that there's no easy way
to do this. I had hoped that with your patch only the index AMs, instead of
the planner, need to know about these equivalences. 

Couldn't we teach the main executor to push a ScalarArrayOpExpr down
into the index AM if the operator belongs to the index's opclass, one
side is indexed, and the other is constant?

best regards,
Florian Pflug




Re: Pushing ScalarArrayOpExpr support into the btree index AM

От
Tom Lane
Дата:
Florian Pflug <fgp@phlo.org> writes:
> On Oct16, 2011, at 19:09 , Tom Lane wrote:
>> That doesn't seem like the same thing at all, because the indexed column
>> is on different sides of the expression in the two cases.  The situation
>> that I'm worried about is "indexedcolumn op ANY(arrayconstant)", and
>> what you're bringing up is "constant op ANY(indexedarraycolumn)".

> Couldn't we teach the main executor to push a ScalarArrayOpExpr down
> into the index AM if the operator belongs to the index's opclass, one
> side is indexed, and the other is constant?

Well, no, unless you're proposing to somehow implement the "constant op
ANY(indexedarraycolumn)" case in all the AMs.  I don't see any practical
way to do it in btree, for one.
        regards, tom lane


Re: Pushing ScalarArrayOpExpr support into the btree index AM

От
Florian Pflug
Дата:
On Oct16, 2011, at 20:26 , Tom Lane wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> On Oct16, 2011, at 19:09 , Tom Lane wrote:
>>> That doesn't seem like the same thing at all, because the indexed column
>>> is on different sides of the expression in the two cases.  The situation
>>> that I'm worried about is "indexedcolumn op ANY(arrayconstant)", and
>>> what you're bringing up is "constant op ANY(indexedarraycolumn)".
> 
>> Couldn't we teach the main executor to push a ScalarArrayOpExpr down
>> into the index AM if the operator belongs to the index's opclass, one
>> side is indexed, and the other is constant?
> 
> Well, no, unless you're proposing to somehow implement the "constant op
> ANY(indexedarraycolumn)" case in all the AMs.  I don't see any practical
> way to do it in btree, for one.

Hm, true. Also, the "operator belongs to the index's opclass" part of the
push-down condition I proposed was wrong anyway, because the "=" operator
in e.g.
 3 = ANY(indexed_int_array_column)

isn't in the opclass int_array_ops. From that, it seems that what would be
needed to make the planner use a GIN index to evaluate the qual above is a
a way for it to realize that there's a connection between some GIN indices
(like *_array_ops) and btree opclasses on the GIN index's storage type. Which
would be nice, but I see now that it has little to do with your proposal,
which is only concerned with operators from the index's opclass.

best regards,
Florian Pflug