Обсуждение: Prepared statement's planning

Поиск
Список
Период
Сортировка

Prepared statement's planning

От
"Vyacheslav Kalinin"
Дата:
Hello,

I would appreciate if someone explained me how exactly prepared parametrized statements are planned, i.e. what kind of assumptions planner makes on param values, selectivity, expected row count etc. that affect in particular whether indexes will be used or not. For instance consider the following case:

create table t1 ( id serial, val int);

insert into t1 (val)
select trunc(100000*random())
  from generate_series(1, 1000000);

create index idx_t1 on t1(val);

analyze t1;
------------------------------

1.
prepare stmt (int) as
 select * from t1 where val < $1;
 
explain execute stmt(100000);

-----------------------
QUERY PLAN
Seq Scan on t1  (cost= 0.00..17401.94 rows=333332 width=8)
 Filter: (val < $1)

2.
prepare stmt (int, int) as
 select * from t1 where val > $1 and val < $2;
 
explain execute stmt(20000, 30000);

-----------------------
QUERY PLAN
Bitmap Heap Scan on t1  (cost= 151.74..5307.59 rows=5000 width=8)
  Recheck Cond: ((val > $1) AND (val < $2))
  ->  Bitmap Index Scan on idx_t1  (cost=0.00..150.49 rows=5000 width=0)
        Index Cond: ((val > $1) AND (val < $2))

Hmm, why does it expect 5000 rows here? What influences this expectation?

3.
prepare stmt (int) as
 select * from t1 where val = $1 or $1 is null;

explain execute stmt(20000);

QUERY PLAN
Seq Scan on t1  (cost=0.00..17401.94 rows=5013 width=8)
  Filter: ((val = $1) OR ($1 IS NULL))

That's the weirdest behavior: where did 5013 rows assumption came from? Why use seq scan then? I should mention that planner refuses to use anything but seq scan here even if I explicitly disable it with "set enable_seqscan to off".

In general, I wonder if one could get somewhat predictable planner behavior in such cases since we have a lot of code written in plpgsql and the patterns above are pretty common there.

Thanks,
Viatcheslav

Re: Prepared statement's planning

От
Tom Lane
Дата:
"Vyacheslav Kalinin" <vka@mgcp.com> writes:
> QUERY PLAN
> Bitmap Heap Scan on t1  (cost= 151.74..5307.59 rows=5000 width=8)
>   Recheck Cond: ((val > $1) AND (val < $2))
>   ->  Bitmap Index Scan on idx_t1  (cost=0.00..150.49 rows=5000 width=0)
>         Index Cond: ((val > $1) AND (val < $2))

> Hmm, why does it expect 5000 rows here? What influences this expectation?

It can see that it's got a range constraint on 'val', but not exactly
how wide the range is, so the selectivity estimate is DEFAULT_RANGE_INEQ_SEL
which is hardwired at 0.005.  0.005 * 1000000 = 5000.

> prepare stmt (int) as
>  select * from t1 where val = $1 or $1 is null;
> explain execute stmt(20000);
> QUERY PLAN
> Seq Scan on t1  (cost=0.00..17401.94 rows=5013 width=8)
>   Filter: ((val = $1) OR ($1 IS NULL))

> That's the weirdest behavior: where did 5013 rows assumption came from? Why
> use seq scan then? I should mention that planner refuses to use anything but
> seq scan here even if I explicitly disable it with "set enable_seqscan to
> off".

It hasn't got a lot of choice: if $1 is null it will have to return the
whole table, a case for which an indexscan is unsuitable.  I think this
falls in the category of "bad query design" not "bad planning".  If that
really is the behavior you want, and not a typo, consider something like

prepare stmt (int) as
 select * from t1 where val = $1
union all
 select * from t1 where $1 is null;

As for the rowcount estimate, I think it's using DEFAULT_UNK_SEL (which
also happens to be 0.005) as the selectivity of an IS NULL test with a
non-Var target.  That's a bit useless in this case, since for any one
call of the query it's either going to be constant true or constant
false, but there's not any obvious better way to do it.

            regards, tom lane

Re: Prepared statement's planning

От
"Vyacheslav Kalinin"
Дата:


On Jan 15, 2008 7:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If that really is the behavior you want, and not a typo

It is, most of parameters passed to a plpgsql function are in fact parts of the filter and if certain filter item is null it is considered unknown and we don't want it to affect the result in that case. The query pattern itself is a shortcut to cover both cases in one expressions which works fine while applied to secondary fields and delivers troubles when used on key indexed fields that might influence planning decisions. Thanks for the "union" idea, other ways around I could think of are dynamic queries or a bunch of plpgsql's 'if-elseif's.

> It can see that it's got a range constraint on 'val', but not exactly
> how wide the range is, so the selectivity estimate is DEFAULT_RANGE_INEQ_SEL
> which is hardwired at 0.005.  0.005 * 1000000 = 5000.

Probably this (parametrized query's planning) is something worth mentioning in the docs one day.

> As for the rowcount estimate, I think it's using DEFAULT_UNK_SEL (which
> also happens to be 0.005) as the selectivity of an IS NULL test with a
> non-Var target.  That's a bit useless in this case, since for any one
> call of the query it's either going to be constant true or constant
> false, but there's not any obvious better way to do it.

Hm.. It could build union'ed plans for every possible value of unknown boolean expression similar to that of your example's, though this could make plans grow huge.