Обсуждение: Future directions for inheritance-hierarchy statistics

Поиск
Список
Период
Сортировка

Future directions for inheritance-hierarchy statistics

От
Tom Lane
Дата:
A few days ago I posted a very-much-WIP patch for making the planner
dynamically combine statistics for each member of an appendrel:
http://www.postgresql.org/message-id/22598.1425686096@sss.pgh.pa.us

That patch was only intended to handle the case of an appendrel generated
by a UNION ALL construct.  But it occurs to me that we could easily
change it to also apply to appendrels generated from inheritance trees.
Then we'd no longer need the whole-inheritance-tree statistics that
ANALYZE currently produces, because we'd only ever look at per-table
statistics in pg_statistic.

This would have one significant drawback, which is that planning for
large inheritance trees (many children) would probably get noticeably
slower.  (But in the common case that constraint exclusion limits a
query to scanning just one or a few children, the hit would be small.)

On the other hand, there would be two very significant benefits.
First, that we would automatically get statistics that account for
partitions being eliminated by constraint exclusion, because only the
non-eliminated partitions are present in the appendrel.  And second,
that we'd be able to forget the whole problem of getting autovacuum
to create whole-inheritance-tree stats.  Right now I'm doubtful that
typical users are getting good up-to-date stats at all for queries of
this sort, because autovacuum will only update those stats if it decides
it needs to analyze the parent table.  Which is commonly empty, so that
there's never a reason to fire an analyze on it.  (We'd left this as
a problem to be solved later when we put in the whole-tree stats
feature in 9.0, but no progress has been made on solving it.)

So I think that going in this direction is clearly a win and we ought
to pursue it.  It's not happening for 9.5 of course, because there's
still a great deal of work to do before anything like this would be
committable.  But I would like to establish a consensus that this
would be a sensible thing to do in 9.6.

The reason I bring it up now is that the inheritance-for-foreign-tables
patch has some code that I don't much like for controlling what happens
with those whole-tree stats when some of the children are foreign tables
that lack ANALYZE support.  If the long-term plan is that whole-tree
stats are going away altogether, then it won't be terribly important
exactly what happens in that case, so we can just do some simple/easy
kluge in the short term and not have to have an argument about what's
the best thing to do.

Comments?
        regards, tom lane



Re: Future directions for inheritance-hierarchy statistics

От
Etsuro Fujita
Дата:
On 2015/03/17 5:18, Tom Lane wrote:
> A few days ago I posted a very-much-WIP patch for making the planner
> dynamically combine statistics for each member of an appendrel:
> http://www.postgresql.org/message-id/22598.1425686096@sss.pgh.pa.us
> 
> That patch was only intended to handle the case of an appendrel generated
> by a UNION ALL construct.  But it occurs to me that we could easily
> change it to also apply to appendrels generated from inheritance trees.
> Then we'd no longer need the whole-inheritance-tree statistics that
> ANALYZE currently produces, because we'd only ever look at per-table
> statistics in pg_statistic.
> 
> This would have one significant drawback, which is that planning for
> large inheritance trees (many children) would probably get noticeably
> slower.  (But in the common case that constraint exclusion limits a
> query to scanning just one or a few children, the hit would be small.)
> 
> On the other hand, there would be two very significant benefits.
> First, that we would automatically get statistics that account for
> partitions being eliminated by constraint exclusion, because only the
> non-eliminated partitions are present in the appendrel.  And second,
> that we'd be able to forget the whole problem of getting autovacuum
> to create whole-inheritance-tree stats.  Right now I'm doubtful that
> typical users are getting good up-to-date stats at all for queries of
> this sort, because autovacuum will only update those stats if it decides
> it needs to analyze the parent table.  Which is commonly empty, so that
> there's never a reason to fire an analyze on it.  (We'd left this as
> a problem to be solved later when we put in the whole-tree stats
> feature in 9.0, but no progress has been made on solving it.)
> 
> So I think that going in this direction is clearly a win and we ought
> to pursue it.  It's not happening for 9.5 of course, because there's
> still a great deal of work to do before anything like this would be
> committable.  But I would like to establish a consensus that this
> would be a sensible thing to do in 9.6.
> 
> The reason I bring it up now is that the inheritance-for-foreign-tables
> patch has some code that I don't much like for controlling what happens
> with those whole-tree stats when some of the children are foreign tables
> that lack ANALYZE support.  If the long-term plan is that whole-tree
> stats are going away altogether, then it won't be terribly important
> exactly what happens in that case, so we can just do some simple/easy
> kluge in the short term and not have to have an argument about what's
> the best thing to do.

That seems like a good idea.

Best regards,
Etsuro Fujita



Re: Future directions for inheritance-hierarchy statistics

От
Robert Haas
Дата:
For some reason, I didn't get Tom's email, only this reply.

On Tue, Mar 17, 2015 at 3:44 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> On 2015/03/17 5:18, Tom Lane wrote:
>> A few days ago I posted a very-much-WIP patch for making the planner
>> dynamically combine statistics for each member of an appendrel:
>> http://www.postgresql.org/message-id/22598.1425686096@sss.pgh.pa.us
>>
>> That patch was only intended to handle the case of an appendrel generated
>> by a UNION ALL construct.  But it occurs to me that we could easily
>> change it to also apply to appendrels generated from inheritance trees.
>> Then we'd no longer need the whole-inheritance-tree statistics that
>> ANALYZE currently produces, because we'd only ever look at per-table
>> statistics in pg_statistic.
>>
>> This would have one significant drawback, which is that planning for
>> large inheritance trees (many children) would probably get noticeably
>> slower.  (But in the common case that constraint exclusion limits a
>> query to scanning just one or a few children, the hit would be small.)

That's a pretty big drawback.  I'm not sure whether it's big enough to
sink the whole idea, but we really need to make planning time on large
inheritance trees cheaper, not more expensive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Future directions for inheritance-hierarchy statistics

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> For some reason, I didn't get Tom's email, only this reply.
>> On 2015/03/17 5:18, Tom Lane wrote:
>>> This would have one significant drawback, which is that planning for
>>> large inheritance trees (many children) would probably get noticeably
>>> slower.  (But in the common case that constraint exclusion limits a
>>> query to scanning just one or a few children, the hit would be small.)

> That's a pretty big drawback.  I'm not sure whether it's big enough to
> sink the whole idea, but we really need to make planning time on large
> inheritance trees cheaper, not more expensive.

Ah, but note the point about how there's no added cost for partitions that
are removed by constraint exclusion.  That should mean that in practical
use it's not a huge problem.  (If you're going to scan K partitions, you
should not be surprised that planning time is O(K).  It will be anyway
thanks to other things such as index selection.)

Also, you're ignoring the prospect of getting better estimates and hence
better plans through having stats that dynamically adapt to the set of
partitions being scanned.  Given the lousy state of maintenance of
whole-tree stats, I really think that this consideration might outweigh
even a significant planning-time hit.  Shaving planning time by producing
crappy estimates isn't usually a good tradeoff.
        regards, tom lane



Re: Future directions for inheritance-hierarchy statistics

От
Robert Haas
Дата:
On Tue, Mar 17, 2015 at 11:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> This would have one significant drawback, which is that planning for
>>>> large inheritance trees (many children) would probably get noticeably
>>>> slower.  (But in the common case that constraint exclusion limits a
>>>> query to scanning just one or a few children, the hit would be small.)
>
>> That's a pretty big drawback.  I'm not sure whether it's big enough to
>> sink the whole idea, but we really need to make planning time on large
>> inheritance trees cheaper, not more expensive.
>
> Ah, but note the point about how there's no added cost for partitions that
> are removed by constraint exclusion.  That should mean that in practical
> use it's not a huge problem.  (If you're going to scan K partitions, you
> should not be surprised that planning time is O(K).  It will be anyway
> thanks to other things such as index selection.)
>
> Also, you're ignoring the prospect of getting better estimates and hence
> better plans through having stats that dynamically adapt to the set of
> partitions being scanned.  Given the lousy state of maintenance of
> whole-tree stats, I really think that this consideration might outweigh
> even a significant planning-time hit.  Shaving planning time by producing
> crappy estimates isn't usually a good tradeoff.

Perhaps so, but I know that the planning time of large inheritance
trees has been a major issue for some of EnterpriseDB's customers.  In
fact, EnterpriseDB has run into a number of customer situations where
planning time even for non-inheritance queries is substantially higher
than, shall we say, a competing commercial product.  With inheritance,
even people who aren't making comparisons with other products start to
get unhappy.  I've always been very pleased with the quality of plans
that our planner generates, but it's becoming increasingly clear to me
that at least one other product is able to provide good plans at a
significantly lower CPU cost, and inheritance is particular trouble
spot.  I don't know exactly what we ought to do about that and perhaps
it's to one side of the issue you're raising here, but I do think it's
an issue that we (the PostgreSQL community) ought to be thinking
about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Future directions for inheritance-hierarchy statistics

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 17, 2015 at 11:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, you're ignoring the prospect of getting better estimates and hence
>> better plans through having stats that dynamically adapt to the set of
>> partitions being scanned.  Given the lousy state of maintenance of
>> whole-tree stats, I really think that this consideration might outweigh
>> even a significant planning-time hit.  Shaving planning time by producing
>> crappy estimates isn't usually a good tradeoff.

> Perhaps so, but I know that the planning time of large inheritance
> trees has been a major issue for some of EnterpriseDB's customers.  In
> fact, EnterpriseDB has run into a number of customer situations where
> planning time even for non-inheritance queries is substantially higher
> than, shall we say, a competing commercial product.  With inheritance,
> even people who aren't making comparisons with other products start to
> get unhappy.  I've always been very pleased with the quality of plans
> that our planner generates, but it's becoming increasingly clear to me
> that at least one other product is able to provide good plans at a
> significantly lower CPU cost, and inheritance is particular trouble
> spot.  I don't know exactly what we ought to do about that and perhaps
> it's to one side of the issue you're raising here, but I do think it's
> an issue that we (the PostgreSQL community) ought to be thinking
> about.

Well, we know that the current approach to inheritance isn't very well
attuned to standard partitioning situations, because it treats every
inheritance child as a de novo problem.  I continue to maintain that
the right fix for that is a partitioning feature that forbids any schema
variation across partitions, which the planner would use to avoid doing
O(N) work when dealing with an N-partition table.  Worrying about
changes that would already be involving less than O(N) work is rather
pointless in this context, IMO.
        regards, tom lane



Re: Future directions for inheritance-hierarchy statistics

От
Jim Nasby
Дата:
On 3/18/15 8:26 AM, Robert Haas wrote:
> In
> fact, EnterpriseDB has run into a number of customer situations where
> planning time even for non-inheritance queries is substantially higher
> than, shall we say, a competing commercial product.

If it's the commercial product I'm thinking of, they use multiple levels 
of caching to avoid both parse costs as well as plan costs. It's always 
impressed me that we didn't have to resort to such shenanigans, but 
perhaps there's only so long we can avoid them.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com