Обсуждение: Functional indexes with slow functions are misplanned
When the results of an expression can be obtained from a functional index, the expression never needs to be evaluated. But the planner doesn't seem to know that. It thinks the expression is evaluated not only once per row, but multiple times, presumably at each step as it descends the btree. So in the example given below the planner will choose to do a very expensive hash or merge join that read the entire table and calls the slow function on each row, rather than the much cheaper nested loop where the function has been pre-evaluated and stored in the index and is simply tested as part of an inner loop. Cranking up the cost of the function does no good, because that just keeps punishing the correct plan at least as much as the others. I've tested this in 9.2 and HEAD. This is a pretty silly test case, but it reproduces the real issue I've seen. create language plperl; create table foo1 as select x::text from generate_series(1,1000) foo (x); create table foo2 as select reverse(x) from foo1; --use a fast version to set up the demo, as we are impatient CREATE or replace FUNCTION slow_reverse(text) RETURNS text LANGUAGE plperl IMMUTABLE STRICT COST 1000000 AS $_X$ return reverse($_[0]); $_X$; create index on foo2 (slow_reverse(reverse)); analyze foo2; --put the slow version in place. CREATE or replace FUNCTION slow_reverse(text) RETURNS text LANGUAGE plperl IMMUTABLE STRICT COST 1000000 AS $_X$ my $foo; foreach (1..1e6) {$foo+=sqrt($_)}; return reverse($_[0]); $_X$; explain select * from foo1 where exists (select 1 from foo2 where slow_reverse(reverse)=x); --- strong-arm it into using the functional index. set enable_hashjoin TO off; set enable_mergejoin TO off; explain select * from foo1 where exists (select 1 from foo2 where slow_reverse(reverse)=x); --- see, it actually is fast! select * from foo1 where exists (select 1 from foo2 where slow_reverse(reverse)=x); Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > When the results of an expression can be obtained from a functional index, > the expression never needs to be evaluated. But the planner doesn't seem > to know that. It thinks the expression is evaluated not only once per row, > but multiple times, presumably at each step as it descends the btree. Hmm ... there are a lot of things that are not done very well for functional indexes, but at least part of this has nothing to do with that. cost_index() thinks it can use list_difference_ptr() against the indexquals list to separate out which restriction conditions will be applied as filter conditions; but that hasn't worked reliably since the equivalence class machinery was invented. So there's about a 50-50 chance that equality index conditions will be charged as though they had to be evaluated at each row returned by the indexscan, though of course they are not. Usually this means no worse than one extra cpu_operator_cost per row, but with an expensive qual condition it could mean a lot more. I think that fully duplicating the logic to identify redundant quals that's in create_indexscan_plan() would likely be a mistake: the effort to prove quals redundant shouldn't be spent on what are only hypothetical index paths. But we could introduce the is_redundant_derived_clause check relatively cheaply, and that's what would matter far more of the time than the other things. Arguably this is a bug fix, but I'm nervous about possibly destabilizing plan choices in the back branches, so I'm inclined to change it in HEAD only. regards, tom lane
On Tue, Mar 3, 2015 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: > > When the results of an expression can be obtained from a functional > index, > > the expression never needs to be evaluated. But the planner doesn't seem > > to know that. It thinks the expression is evaluated not only once per > row, > > but multiple times, presumably at each step as it descends the btree. > > Hmm ... there are a lot of things that are not done very well for > functional indexes, but at least part of this has nothing to do with that. > > cost_index() thinks it can use list_difference_ptr() against the > indexquals list to separate out which restriction conditions will be > applied as filter conditions; but that hasn't worked reliably since the > equivalence class machinery was invented. So there's about a 50-50 chance > that equality index conditions will be charged as though they had to be > evaluated at each row returned by the indexscan, though of course they are > not. Usually this means no worse than one extra cpu_operator_cost per > row, but with an expensive qual condition it could mean a lot more. > > I think that fully duplicating the logic to identify redundant quals > that's in create_indexscan_plan() would likely be a mistake: the effort to > prove quals redundant shouldn't be spent on what are only hypothetical > index paths. But we could introduce the is_redundant_derived_clause check > relatively cheaply, and that's what would matter far more of the time than > the other things. > > Arguably this is a bug fix, but I'm nervous about possibly destabilizing > plan choices in the back branches, so I'm inclined to change it in HEAD > only. > The changes you made in HEAD solved the problem. Now I'll have a reason to upgrade once 9.5 comes out. Thanks, Jeff