Обсуждение: [HACKERS] Linear vs. nonlinear planner cost estimates

Поиск
Список
Период
Сортировка

[HACKERS] Linear vs. nonlinear planner cost estimates

От
Tom Lane
Дата:
I've been looking into the problem reported here:
https://www.postgresql.org/message-id/CAOfJSTwzQ7Fx6Yjeg9mFkMsM5OVKPoa=EgkHceGdkr1TWg8vkA@mail.gmail.com

I can reproduce similar misbehavior with this test case:

create table updates as
select (-log(random()) * 10)::int as driver_id,      now() + x * '1 second'::interval as time,      random() as junk
from generate_series(1,15000000) x;

(I've intentionally set up this test data so that "driver_id" is randomly
ordered while the "time" column is perfectly in table order.  While I
don't have direct evidence, it seems pretty plausible that the original
problem table might be somewhat like that.)

set maintenance_work_mem = '100MB';
set default_statistics_target = 10000;
vacuum analyze updates;
create index on updates(driver_id, time);

explain SELECT * FROM updates WHERE driver_id=1 ORDER BY "time" DESC LIMIT 1;
        QUERY PLAN                                                       

---------------------------------------------------------------------------------------------------------------------Limit
(cost=0.56..0.73 rows=1 width=20)  ->  Index Scan Backward using updates_driver_id_time_idx on updates
(cost=0.56..470495.61rows=2750660 width=20)        Index Cond: (driver_id = 1) 
(3 rows)

That's all good, but if we create this less-well-adapted index, the
planner prefers it:

create index on updates(time);

explain SELECT * FROM updates WHERE driver_id=1 ORDER BY "time" DESC LIMIT 1;
   QUERY PLAN                                                  
-----------------------------------------------------------------------------------------------------------Limit
(cost=0.43..0.62rows=1 width=20)  ->  Index Scan Backward using updates_time_idx on updates  (cost=0.43..522569.43
rows=2750660width=20)        Filter: (driver_id = 1) 
(3 rows)

Why did that happen?  The filter can be expected to reject about four out
of every five tuples, so this plan should be expected to cost about five
times as much to get the first row.  (If the driver_id values aren't
randomly located, things could be far worse; but the point here is that
even under the planner's assumption of uncorrelated row order, this should
not be considered a preferable plan.)

I think the reason is that, because the planner knows that using this
index will require a full-index scan, it's costing the updates_time_idx
plan on the basis of a full scan.  Moreover, that index is perfectly
correlated with the physical table order, so it gets a low cost per fetch.
Evidently, the cost is coming out at about 522569/15000000 or 0.035 cost
units per tuple fetched.  Meanwhile, the better plan is being costed with
the knowledge that it fetches only one-fifth of the table; and we also see
that that index has zero ordering correlation with the table.  So the cost
per tuple fetched is coming out at about 0.171, which is enough more that
even having to fetch five tuples instead of one makes the poorer plan look
cheaper.

This is, of course, nonsense; we're costing the plans on the basis of
fetching many more tuples than will actually happen, and the cost
amortization effects that cost_index is accounting for will not happen.
So in short, we're taking a fundamentally nonlinear cost estimate from
cost_index and then scaling it linearly to try to account for the LIMIT.

I've known that this was a theoretical hazard of the way we do LIMIT
costing for some time (I recall bitching about it in my 2011 PGCon talk).
But this is the first case I've seen where it results in a wrong choice
in a relatively simple query.

I don't have a concrete proposal right now about how to fix this.  The
most expansive response would be to decorate every path with an explicitly
nonlinear cost function, which would need to be able to report the cost
to fetch the first N tuples rather than the total scan cost.  That would
be a lot of work though.

Maybe we could make this work just by setting the startup cost of an
indexscan differently.  We could redefine "startup cost" as "cost to fetch
the first tuple and then stop", and compute that independently of the
total cost for plan types where it'd matter.  That would fix the problem
for LIMIT 1 cases, and even for cases with a larger limit, scaling
linearly from that to the total cost would produce a better estimate than
what we get now.  But it feels like it's still a band-aid.

Thoughts?
        regards, tom lane



Re: [HACKERS] Linear vs. nonlinear planner cost estimates

От
Robert Haas
Дата:
On Wed, Dec 14, 2016 at 2:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't have a concrete proposal right now about how to fix this.  The
> most expansive response would be to decorate every path with an explicitly
> nonlinear cost function, which would need to be able to report the cost
> to fetch the first N tuples rather than the total scan cost.  That would
> be a lot of work though.

And it would have some significant overhead, I bet.

> Maybe we could make this work just by setting the startup cost of an
> indexscan differently.  We could redefine "startup cost" as "cost to fetch
> the first tuple and then stop", and compute that independently of the
> total cost for plan types where it'd matter.  That would fix the problem
> for LIMIT 1 cases, and even for cases with a larger limit, scaling
> linearly from that to the total cost would produce a better estimate than
> what we get now.  But it feels like it's still a band-aid.

I think that would be a good place to start.  I bet it would help
materially, and it's a lot less work than anything more sophisticated.
There are plenty of cases (semi joins, and many nested loops that are
not semi joins) where the number of rows that we expect to fetch is 1
or very close to 1; LIMIT 100 or LIMIT 1000 happens, but it's a lot
less common.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Linear vs. nonlinear planner cost estimates

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 14, 2016 at 2:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't have a concrete proposal right now about how to fix this.  The
>> most expansive response would be to decorate every path with an explicitly
>> nonlinear cost function, which would need to be able to report the cost
>> to fetch the first N tuples rather than the total scan cost.  That would
>> be a lot of work though.

> And it would have some significant overhead, I bet.

I think the extra cost of running such a function would only come in when
we're actually considering a query with LIMIT and are ready to choose
which path gets the LIMIT stuck on top of it.  That seems okay to me.
My real pain point with extra planner work is when it happens in queries
that get no benefit.

The really *critical* aspect of this, which could cost far more than a
few extra executions of a costing function, comes in if it damages
add_path's ability to prune inferior paths early.  So we would not want
to simply drop the startup_cost field; and we certainly don't want
add_path calling this function during every path comparison, either.
So my thought is that we'd need to keep startup_cost for add_path to
use, though we could (and likely should) redefine it as below.  We
should still be able to assume that if path A dominates path B at
first-tuple and last-tuple costs, it dominates everywhere in between.

A thought that occurred to me after more study of the problem example
is that the "good" index is a bit bigger than the "bad" one, meaning
it will get charged a bit more in index descent costs.  With our current
definition of startup_cost as being only the index descent cost, that
means the good index looks worse on startup_cost than the bad one.  In
this example, the good index should survive the add_path tournament
anyway because it has better total_cost --- but it's easy to imagine
cases where that wouldn't be true.  So I'm thinking that redefining
startup_cost as time to fetch the first tuple would be a good thing
in terms of making sure that desirable plans don't lose out in add_path.

>> Maybe we could make this work just by setting the startup cost of an
>> indexscan differently.  We could redefine "startup cost" as "cost to fetch
>> the first tuple and then stop", and compute that independently of the
>> total cost for plan types where it'd matter.  That would fix the problem
>> for LIMIT 1 cases, and even for cases with a larger limit, scaling
>> linearly from that to the total cost would produce a better estimate than
>> what we get now.  But it feels like it's still a band-aid.

> I think that would be a good place to start.  I bet it would help
> materially, and it's a lot less work than anything more sophisticated.
> There are plenty of cases (semi joins, and many nested loops that are
> not semi joins) where the number of rows that we expect to fetch is 1
> or very close to 1; LIMIT 100 or LIMIT 1000 happens, but it's a lot
> less common.

Yeah.  Also, per the further thoughts above, we'd probably still end up
redefining startup_cost like this even if we then plastered on a function
to do more complex interpolation in between first and last tuple.  So
it'd make sense to do this as a first step even if we add the additional
mechanism later.

I'll put this on my to-do list...
        regards, tom lane



Re: [HACKERS] Linear vs. nonlinear planner cost estimates

От
Robert Haas
Дата:
On Wed, Dec 14, 2016 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The really *critical* aspect of this, which could cost far more than a
> few extra executions of a costing function, comes in if it damages
> add_path's ability to prune inferior paths early.  So we would not want
> to simply drop the startup_cost field; and we certainly don't want
> add_path calling this function during every path comparison, either.

Agreed.

> So my thought is that we'd need to keep startup_cost for add_path to
> use, though we could (and likely should) redefine it as below.  We
> should still be able to assume that if path A dominates path B at
> first-tuple and last-tuple costs, it dominates everywhere in between.

In theory, there's no reason that has to be true.   B could start out
a little more expensive than A, and then grow very slowly thereafter
until suddenly becoming super-expensive for the very last tuple, so
that for limits more than 1 but less than the total number of tuples
produced B actually wins.  This actually isn't a totally unrealistic
case: consider an index scan with a filter condition where all of the
tuples that pass the filter are clustered near the beginning.  Getting
any number of tuples that are actually present is cheap, but the very
last fetch that fails to find any more tuples is crushingly expensive.

That having been said, I doubt it's realistic to make the cost model
good enough to hope we'd get such cases correct, so making the
assumption that you propose is probably the right idea anyway.

> A thought that occurred to me after more study of the problem example
> is that the "good" index is a bit bigger than the "bad" one, meaning
> it will get charged a bit more in index descent costs.  With our current
> definition of startup_cost as being only the index descent cost, that
> means the good index looks worse on startup_cost than the bad one.  In
> this example, the good index should survive the add_path tournament
> anyway because it has better total_cost --- but it's easy to imagine
> cases where that wouldn't be true.  So I'm thinking that redefining
> startup_cost as time to fetch the first tuple would be a good thing
> in terms of making sure that desirable plans don't lose out in add_path.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company