Обсуждение: Functional indexes with slow functions are misplanned

Поиск
Список
Период
Сортировка

Functional indexes with slow functions are misplanned

От
Jeff Janes
Дата:
When the results of an expression can be obtained from a functional index,
the expression never needs to be evaluated.  But the planner doesn't seem
to know that.  It thinks the expression is evaluated not only once per row,
but multiple times, presumably at each step as it descends the btree.


So in the example given below the planner will choose to do a very
expensive hash or merge join that read the entire table and calls the slow
function on each row, rather than the much cheaper nested loop where the
function has been pre-evaluated and stored in the index and is simply
tested as part of an inner loop.


Cranking up the cost of the function does no good, because that just keeps
punishing the correct plan at least as much as the others.


I've tested this in 9.2 and HEAD.


This is a pretty silly test case, but it reproduces the real issue I've
seen.


create language plperl;

create table foo1 as select x::text from generate_series(1,1000) foo (x);

create table foo2 as select reverse(x) from foo1;

--use a fast version to set up the demo, as we are impatient

CREATE or replace FUNCTION slow_reverse(text) RETURNS text

    LANGUAGE plperl IMMUTABLE STRICT COST 1000000

    AS $_X$

  return reverse($_[0]);

$_X$;

create index on foo2 (slow_reverse(reverse));

analyze foo2;

--put the slow version in place.

CREATE or replace FUNCTION slow_reverse(text) RETURNS text

    LANGUAGE plperl IMMUTABLE STRICT COST 1000000

    AS $_X$

  my $foo; foreach (1..1e6) {$foo+=sqrt($_)};

  return reverse($_[0]);

$_X$;



explain select * from foo1 where exists (select 1 from foo2 where
slow_reverse(reverse)=x);

--- strong-arm it into using the functional index.

set enable_hashjoin TO off;

set enable_mergejoin TO off;

explain select * from foo1 where exists (select 1 from foo2 where
slow_reverse(reverse)=x);

--- see, it actually is fast!

select * from foo1 where exists (select 1 from foo2 where
slow_reverse(reverse)=x);


Cheers,


Jeff

Re: Functional indexes with slow functions are misplanned

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> When the results of an expression can be obtained from a functional index,
> the expression never needs to be evaluated.  But the planner doesn't seem
> to know that.  It thinks the expression is evaluated not only once per row,
> but multiple times, presumably at each step as it descends the btree.

Hmm ... there are a lot of things that are not done very well for
functional indexes, but at least part of this has nothing to do with that.

cost_index() thinks it can use list_difference_ptr() against the
indexquals list to separate out which restriction conditions will be
applied as filter conditions; but that hasn't worked reliably since the
equivalence class machinery was invented.  So there's about a 50-50 chance
that equality index conditions will be charged as though they had to be
evaluated at each row returned by the indexscan, though of course they are
not.  Usually this means no worse than one extra cpu_operator_cost per
row, but with an expensive qual condition it could mean a lot more.

I think that fully duplicating the logic to identify redundant quals
that's in create_indexscan_plan() would likely be a mistake: the effort to
prove quals redundant shouldn't be spent on what are only hypothetical
index paths.  But we could introduce the is_redundant_derived_clause check
relatively cheaply, and that's what would matter far more of the time than
the other things.

Arguably this is a bug fix, but I'm nervous about possibly destabilizing
plan choices in the back branches, so I'm inclined to change it in HEAD
only.

            regards, tom lane

Re: Functional indexes with slow functions are misplanned

От
Jeff Janes
Дата:
On Tue, Mar 3, 2015 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Jeff Janes <jeff.janes@gmail.com> writes:
> > When the results of an expression can be obtained from a functional
> index,
> > the expression never needs to be evaluated.  But the planner doesn't seem
> > to know that.  It thinks the expression is evaluated not only once per
> row,
> > but multiple times, presumably at each step as it descends the btree.
>
> Hmm ... there are a lot of things that are not done very well for
> functional indexes, but at least part of this has nothing to do with that.
>
> cost_index() thinks it can use list_difference_ptr() against the
> indexquals list to separate out which restriction conditions will be
> applied as filter conditions; but that hasn't worked reliably since the
> equivalence class machinery was invented.  So there's about a 50-50 chance
> that equality index conditions will be charged as though they had to be
> evaluated at each row returned by the indexscan, though of course they are
> not.  Usually this means no worse than one extra cpu_operator_cost per
> row, but with an expensive qual condition it could mean a lot more.
>
> I think that fully duplicating the logic to identify redundant quals
> that's in create_indexscan_plan() would likely be a mistake: the effort to
> prove quals redundant shouldn't be spent on what are only hypothetical
> index paths.  But we could introduce the is_redundant_derived_clause check
> relatively cheaply, and that's what would matter far more of the time than
> the other things.
>
> Arguably this is a bug fix, but I'm nervous about possibly destabilizing
> plan choices in the back branches, so I'm inclined to change it in HEAD
> only.
>

The changes you made in HEAD solved the problem.  Now I'll have a reason to
upgrade once 9.5 comes out.

Thanks,

Jeff