Re: Cost estimates for parameterized paths

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Cost estimates for parameterized paths
Дата
Msg-id 21254.1320876766@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Cost estimates for parameterized paths  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Cost estimates for parameterized paths  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
More than a year ago, I wrote in
http://archives.postgresql.org/message-id/14624.1283463072@sss.pgh.pa.us

> Awhile back I ranted about replacing the planner's concept of inner
> indexscans with a more generalized notion of "parameterized paths":
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php

> The executor fixes for that are done, and now I'm grappling with getting
> the planner to do something useful with it.  The biggest problem I've run
> into is that a parameterized path can't really be assigned a fixed cost
> in the same way that a normal path can.  The current implementation of
> cost_index() depends on knowing the size of the outer relation --- that
> is, the expected number of execution loops for the indexscan --- in order
> to account for cache effects sanely while estimating the average cost of
> any one inner indexscan.

Since this project has been stalled for so long, I am thinking that what
I need to do to make some progress is to punt on the repeated-execution
cache effects problem, at least for the first cut.  I propose costing
parameterized inner paths on the "worst case" basis that they're only
executed once, and don't get any benefit from caching across repeated
executions.  This seems like a reasonably sane first-order approximation
on two grounds:

1. In most of the cases where such a plan is of interest, the outer
relation for the nestloop actually does provide only one or a few rows.
If it generates a lot of rows, you probably don't want a nestloop
anyhow.

2. In the cases where we really care, the alternatives are so much worse
that the parameterized nestloop will win even if it's estimated very
conservatively.

Another thing in the back of my mind is that the whole issue of cache
effects is something we know we don't model very well, so putting large
amounts of time into enlarging the present approach to handle more
complicated plan structures may be misplaced effort anyway.

So unless somebody's got a better idea, I'm going to push forward with
getting the parameterized-path infrastructure in place, and just settle
for a crude cost model for the moment.  Perhaps the implementation
experience will shake loose some new ideas about how to do the cost
modeling.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dimitri Fontaine
Дата:
Сообщение: Re: const correctness
Следующее
От: Tom Lane
Дата:
Сообщение: Re: heap vacuum & cleanup locks