Обсуждение: Query planner/stored procedure cost

Поиск
Список
Период
Сортировка

Query planner/stored procedure cost

От
"Jason M. Felice"
Дата:
Hello...

I haven't been subscribed in a while, but I've got an issue and am trying to
determine if the Right Way(tm) is the quickest way to fix it.

Basically, I have some very expensive stored procedures that determine whether
a user should have access to particular rows in a query (not a postgresql
user, we only use one postgresql user... the user is passed as a parameter to
the function).  The logic--per row--contains about a dozen queries and probably
averages eight queries per run, with short-circuiting and all.

So it is _very_ expensive.  Given that I use this function in lots of queries
with hairy joins and all, I'd much like for the optimizer to know what to do
with the function.  Empirically, I deduce that the optimizer treats all
procedures as inexpensive (it seems to always just tack it on to the `Filter'
slot when scanning the related table).

Currently I'm using stored procedures returning multiple rows to get around
the planner on these and defer the expensive procedure until the last possible
moment (so that joins and other table criteria have a chance to filter out
a lot of records).  This typically shaves 75% of the time off of these
queries.

So, the question is:

What am I looking at in doing the following:

1) Adding a mechanism to tell PostgreSQL how expensive a procedure is  (a system table which can be updated manually,
oran existing system  table if there is a logical place for it).
 

2) Updating the planner to consider the procedure's cost in estimates.

3) Changing the query planner to consider "bubbling up" the function to  an outer filter slot.

Possibly, also:

4) Changing the planner to order expressions in a `Filter' slot by cost.

although I don't mind doing this manually and I know the order can determine
which indices PostgreSQL uses.

I'm still mulling it over, and I'm guessing the real problem here is if it
is a wise generalization that we can "bubble-up" the function.  What if the
function has side effects?  Does this break?  We can at least do procedures
with `iscachable' flag.


Disclaimer:  I haven't every really hacked the planner code, but I have a
good feel for how it works from lots and _lots_ of experience with it <g>

-Jay 'Eraserhead' Felice


Re: Query planner/stored procedure cost

От
Tom Lane
Дата:
"Jason M. Felice" <jfelice@cronosys.com> writes:
> What am I looking at in doing the following:
> 1) Adding a mechanism to tell PostgreSQL how expensive a procedure is

Something like reverting to Postgres 4.2 :-(

There are still traces in the code of someone's PhD thesis concerning
where to invoke expensive functions.  But it's extremely dead code.
Bringing it back to life would involve substantial attention from a
wizard.

> 3) Changing the query planner to consider "bubbling up" the function to
>    an outer filter slot.

This is a bit tricky now, because there is a bunch of equality-deduction
logic that assumes that WHERE quals are applied as soon as possible.
You'd have to think about the costs of abandoning those optimizations vs
the costs of evaluating the expensive function too soon.  Right offhand
I see no principled way to make that choice, because in our bottom-up
planner it has to be made before you can possibly have an idea of the
full consequences.

> 4) Changing the planner to order expressions in a `Filter' slot by cost.

This we could do --- there's already a hack to push subselects past the
other clauses.  I've hesitated to do too much in that area, because the
user might know better than the planner which clauses to apply first.
But at least it's a pretty localized decision.
        regards, tom lane