Обсуждение: Shouldn't we have a way to avoid "risky" plans?
Folks, Yet more evidence that we need some way to assess query plans which are high-risk and avoid them (or have Yet Another GUC): Merge Join (cost=29.16..1648.00 rows=382 width=78) (actual time=57215.167..57215.216 rows=1 loops=1) Merge Cond: (rn.node_id = device_nodes.node_id) -> Nested Loop (cost=0.00..11301882.40 rows=6998 width=62) (actual time=57209.291..57215.030 rows=112 loops=1) Join Filter: (node_ep.node_id = rn.node_id) -> Nested Loop (cost=0.00..11003966.85 rows=90276 width=46) (actual time=0.027..52792.422 rows=90195 loops=1) -> Index Scan using ix_ne_ns on node_ep (cost=0.00..1545943.45 rows=32606992 width=26) (actual time=0.010..7787.043 rows=32606903 loops=1) -> Index Scan using ix_nefp_eid on ep_fp (cost=0.00..0.28 rows=1 width=20) (actual time=0.001..0.001 rows=0 loops=32606903) Index Cond: (ep_fp.ep_id = node_ep.ep_id) -> Materialize (cost=0.00..5.30 rows=220 width=16) (actual time=0.000..0.019 rows=220 loops=90195) -> Seq Scan on mytable rn (cost=0.00..4.20 rows=220 width=16) (actual time=0.008..0.043 rows=220 loops=1) -> Sort (cost=28.18..28.21 rows=12 width=16) (actual time=0.164..0.165 rows=10 loops=1) Sort Key: device_nodes.node_id Sort Method: quicksort Memory: 25kB -> Index Scan using ix_dn_did on device_nodes (cost=0.00..27.96 rows=12 width=16) (actual time=0.086..0.134 rows=10 loops=1) Index Cond: (dev_id = 18165) Total runtime: 57215.329 ms AFAICT, what's happening in this query is that PostgreSQL's statistics on the device_nodes and several other tables are slightly out of date (as in 5% of the table). Thus it thinks that nothing will match the list of node_ids in "mytable", and that it can exit the merge join early and ignore the whole huge cost of the join plan. This particular form of out-of-dateness will be fixed in 9.1 (it's due to values being higher than the highest histogram bucket in pg_stat), but not all forms will be. It really seems like we should be able to detect an obvious high-risk situation like this one. Or maybe we're just being too optimistic about discarding subplans? BTW, the optimal plan for this query (post-analyze) is this one: Nested Loop (cost=0.00..213068.26 rows=12 width=78) (actual time=0.374..0.514 rows=1 loops=1) Join Filter: (device_nodes.node_id = rn.node_id) -> Seq Scan on mytable rn (cost=0.00..4.20 rows=220 width=16) (actual time=0.013..0.050 rows=220 loops=1) -> Materialize (cost=0.00..213024.49 rows=12 width=62) (actual time=0.001..0.002 rows=1 loops=220) -> Nested Loop (cost=0.00..213024.43 rows=12 width=62) (actual time=0.077..0.278 rows=1 loops=1) -> Nested Loop (cost=0.00..211740.04 rows=4428 width=42) (actual time=0.070..0.269 rows=1 loops=1) -> Index Scan using ix_dn_did on device_nodes (cost=0.00..51.92 rows=13 width=16) (actual time=0.058..0.115 rows=10 loops=1) Index Cond: (dev_id = 18165) -> Index Scan using ix_ne_ns on node_ep (cost=0.00..16137.45 rows=11700 width=26) (actual time=0.014..0.014 rows=0 loops=10) Index Cond: (node_ep.node_id = device_nodes.node_id) -> Index Scan using ix_nefp_eid on ep_fp (cost=0.00..0.28 rows=1 width=20) (actual time=0.006..0.007 rows=1 loops=1) Index Cond: (ep_fp.ep_id = node_ep.ep_id); -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Wed, Mar 23, 2011 at 2:12 PM, Josh Berkus <josh@agliodbs.com> wrote: > Folks, > >... > It really seems like we should be able to detect an obvious high-risk > situation like this one. Or maybe we're just being too optimistic about > discarding subplans? Why not letting the GEQO learn from past mistakes? If somehow a post-mortem analysis of queries can be done and accounted for, then these kinds of mistakes would be a one-time occurrence. Ideas: * you estimate cost IFF there's no past experience. * if rowcount estimates miss by much, a correction cache could be populated with extra (volatile - ie in shared memory) statistics * or, if rowcount estimates miss by much, autoanalyze could be scheduled * consider plan bailout: execute a tempting plan, if it takes too long or its effective cost raises well above the expected cost, bail to a safer plan * account for worst-case performance when evaluating plans
On 3/23/11 10:35 AM, Claudio Freire wrote: > * consider plan bailout: execute a tempting plan, if it takes too > long or its effective cost raises well above the expected cost, bail > to a safer plan That would actually solve this particular case. It would still require us to have some definition of "safer" though. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Wed, Mar 23, 2011 at 5:29 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 3/23/11 10:35 AM, Claudio Freire wrote: >> * consider plan bailout: execute a tempting plan, if it takes too >> long or its effective cost raises well above the expected cost, bail >> to a safer plan > > That would actually solve this particular case. It would still require > us to have some definition of "safer" though. In my head, safer = better worst-case performance.
On Wed, Mar 23, 2011 at 1:12 PM, Josh Berkus <josh@agliodbs.com> wrote: > AFAICT, what's happening in this query is that PostgreSQL's statistics > on the device_nodes and several other tables are slightly out of date > (as in 5% of the table). What about some manner of query feedback mechanism ( along the lines of what explain analyze yields ) to track "stats staleness" in general? Probably, I misunderstand the costs of executing explain analyze.
Claudio Freire <klaussfreire@gmail.com> writes: > On Wed, Mar 23, 2011 at 5:29 PM, Josh Berkus <josh@agliodbs.com> wrote: >> On 3/23/11 10:35 AM, Claudio Freire wrote: >>> �* �consider plan bailout: execute a tempting plan, if it takes too >>> long or its effective cost raises well above the expected cost, bail >>> to a safer plan >> That would actually solve this particular case. �It would still require >> us to have some definition of "safer" though. > In my head, safer = better worst-case performance. If the planner starts operating on the basis of worst case rather than expected-case performance, the complaints will be far more numerous than they are today. regards, tom lane
On Wed, Mar 23, 2011 at 6:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >> In my head, safer = better worst-case performance. > > If the planner starts operating on the basis of worst case rather than > expected-case performance, the complaints will be far more numerous than > they are today. I imagine, that's why, if you put my comment in context, I was talking about picking a safer plan only when the "better on average one" fails miserably.
> If the planner starts operating on the basis of worst case rather than > expected-case performance, the complaints will be far more numerous than > they are today. Yeah, I don't think that's the way to go. The other thought I had was to accumulate a "risk" stat the same as we accumulate a "cost" stat. However, I'm thinking that I'm overengineering what seems to be a fairly isolated problem, in that we might simply need to adjust the costing on this kind of a plan. Also, can I say that the cost figures in this plan are extremely confusing? Is it really necessary to show them the way we do? Merge Join (cost=29.16..1648.00 rows=382 width=78) (actual time=57215.167..57215.216 rows=1 loops=1) Merge Cond: (rn.node_id = device_nodes.node_id) -> Nested Loop (cost=0.00..11301882.40 rows=6998 width=62) (actual time=57209.291..57215.030 rows=112 loops=1) Join Filter: (node_ep.node_id = rn.node_id) -> Nested Loop (cost=0.00..11003966.85 rows=90276 width=46) (actual time=0.027..52792.422 rows=90195 loops=1) The first time I saw the above, I thought we had some kind of glibc math bug on the host system. Costs are supposed to accumulate upwards. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
2011/3/23 Tom Lane <tgl@sss.pgh.pa.us>
Claudio Freire <klaussfreire@gmail.com> writes:If the planner starts operating on the basis of worst case rather than
> On Wed, Mar 23, 2011 at 5:29 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On 3/23/11 10:35 AM, Claudio Freire wrote:
>>> * consider plan bailout: execute a tempting plan, if it takes too
>>> long or its effective cost raises well above the expected cost, bail
>>> to a safer plan
>> That would actually solve this particular case. It would still require
>> us to have some definition of "safer" though.
> In my head, safer = better worst-case performance.
expected-case performance, the complaints will be far more numerous than
they are today.
This can se GUC-controllable. Like plan_safety=0..1 with low default value. This can influence costs of plans where cost changes dramatically with small table changes and/or statistics is uncertain. Also this can be used as direct "hint" for such dangerous queries by changing GUC for session/single query.
--
Best regards,
Vitalii Tymchyshyn
2011/3/24 Віталій Тимчишин <tivv00@gmail.com>: > 2011/3/23 Tom Lane <tgl@sss.pgh.pa.us> >> >> Claudio Freire <klaussfreire@gmail.com> writes: >> > On Wed, Mar 23, 2011 at 5:29 PM, Josh Berkus <josh@agliodbs.com> wrote: >> >> On 3/23/11 10:35 AM, Claudio Freire wrote: >> >>> * consider plan bailout: execute a tempting plan, if it takes too >> >>> long or its effective cost raises well above the expected cost, bail >> >>> to a safer plan >> >> >> That would actually solve this particular case. It would still require >> >> us to have some definition of "safer" though. >> >> > In my head, safer = better worst-case performance. >> >> If the planner starts operating on the basis of worst case rather than >> expected-case performance, the complaints will be far more numerous than >> they are today. >> > This can se GUC-controllable. Like plan_safety=0..1 with low default value. > This can influence costs of plans where cost changes dramatically with small > table changes and/or statistics is uncertain. Also this can be used as > direct "hint" for such dangerous queries by changing GUC for session/single > query. ISTM if you add statistics miss and 'risk margin' to the things the planner would have to consider while generating a plan, you are greatly increasing the number of plan paths that would have to be considered for any non trivial query. merlin merlin
>> This can se GUC-controllable. Like plan_safety=0..1 with low default value. >> This can influence costs of plans where cost changes dramatically with small >> table changes and/or statistics is uncertain. Also this can be used as >> direct "hint" for such dangerous queries by changing GUC for session/single >> query. > > ISTM if you add statistics miss and 'risk margin' to the things the > planner would have to consider while generating a plan, you are > greatly increasing the number of plan paths that would have to be > considered for any non trivial query. FWIW, these ideas are well established in the statistical community. Currently, we are essentially using "maximum likelihood estimators". We estimate a bunch of selectivities by choosing what is most likely, plug them in to our objective function, and then minimize based upon the plugged in values. In a wide variety of cases, MLE's can be shown to be "asymptotically" optimal. That is, as our sample distribution approaches the true distribution, the best we can possibly do is to use the MLE. This is pretty sensible - if we actually knew all of the selectivities then the results aren't really random anymore. However, they often perform very poorly with small sample sizes - particularly if the loss function is very sensitive to relatively small fluctuations in the parameter estimates ( which postgres certainly is - switching from a hash join to a nest-loop can be disastrous ). Using the estimate that minimizes the "worst-case" performance is precisely a minimax estimator. There, the goal is to minimize the risk function ( iow, plan cost ) under the worst possible conditions. Wikipedia has a pretty good treatment - just think "plan cost" whenever you see "risk". Another approach, that hasn't been suggested yet, is some Bayesian update method. There, rather than calculating a specific parameter value ( like ndistinct ), you try to store the entire distribution and choose the plan that minimizes cost averaged over all of the possible parameter values. Example: ( please excuse the unrealistic numbers ) For instance, rather than estimate the selectivity of the join ( relation1.col1 = relation2.col1 ) to be 0.01, we would say it is 0.1 w/ probability 0.2 and 0.001 with probability 0.8. So, here is how we would choose the plan now: cost( nestloop | selectivity = 0.01 ) = 1 cost( hashjoin | selectivity = 0.01 ) = 2 cost( mergejoin | selectivity = 0.01 ) = 50 Here would be the bayesian approach: cost( nestloop | selectivity = 0.001 ) = 0.1 cost( hashjoin | selectivity = 0.001 ) = 1 cost( mergejoin | selectivity = 0.001 ) = 50 cost( nestloop | selectivity = 0.1 ) = 10 cost( hashjoin | selectivity = 0.1 ) = 3 cost( mergejoin | selectivity = 0.1 ) = 50 So, the bayesian costs are: nestloop: 0.1*0.8 + 10*0.2 = 2.08 hashjoin: 1.0*0.8 + 3*0.2 = 1.4 nestloop: 50*0.8 + 50*0.2 = 50 so the hashjoin would be chosen. For completeness, the minimax costs would be: nestloop: max( 0.1, 10 ) hashjoin: max( 1, 3 ) nestloop: max( 50, 50 ) So, again, the hashjoin is chosen. I obviously have a bias towards the Bayesian approach, but it's not because I expect it to necessarily perform a whole lot better but, rather, it reduces to the other two approaches. If we want the current behavior, then simply store the MLE selectivity with probability 1. If we want the minimax estimate, choose the worst possible value. Or anything in between. Also, ( not that I have even close to the experience / expertise to make this claim - so take this with a grain of salt ) it seems that the code changes would be substantial but pretty straightforward and easy to localize. Rather than passing a selectivity, pass a pair of arrays with selectivities and probabilities. Initially, we could keep the current estimates ( every passed array would be of length one ) and then make changes as problems appear ( like Josh's ) I hope my little estimation procedure tutorial has been a little helpful, please feel free to contact me off list if you have questions/want references. Best, Nathan Boley
On Thu, Mar 24, 2011 at 5:30 PM, Nathan Boley <npboley@gmail.com> wrote: > Another approach, that hasn't been suggested yet, is some Bayesian > update method. There, rather than calculating a specific parameter > value ( like ndistinct ), you try to store the entire distribution and > choose the plan that minimizes cost averaged over all of the possible > parameter values. I've done similar stuff for work, you don't have to go all the way to storing complete probability distributions, usually a simple likelihood range is enough. In essence, instead of having a scalar MLE for plan cost, you implement a "ranged" estimator, that estimates the most-likely range of plan costs, with mean and standard deviation from mean. This essentially gives a risk value, since risky plans will have very large standard deviations from the mean. > Also, ( not that I have even close to the experience / expertise to > make this claim - so take this with a grain of salt ) it seems that > the code changes would be substantial but pretty straightforward and > easy to localize. Rather than passing a selectivity, pass a pair of > arrays with selectivities and probabilities. If you approximage the probability distributions as I outlined above, it's even simpler. Approximate, but simpler - and since you retain the original cost estimations in the form of mean cost values, you can easily tune the GEQO to perform as it currently does (ignore variance) or with a safety margin (account for variance). About issues like these being uncommon - I disagree. I routinely have to work around query inefficiencies because GEQO does something odd - and since postgres gives me too few tools to tweak plans (increase statistics, use subqueries, rephrase joins, no direct tool before CTEs which are rather new), it becomes an art form, and it becomes very unpredictable and an administrative burden. Out of the blue, statistics change, queries that worked fine start to perform poorly, and sites go down. If GEQO could detect unsafe plans and work around them automatically, it would be a major improvement. Granted, though, this should be approached with care.
24.03.11 20:41, Merlin Moncure написав(ла): > 2011/3/24 Віталій Тимчишин<tivv00@gmail.com>: >> >> This can se GUC-controllable. Like plan_safety=0..1 with low default value. >> This can influence costs of plans where cost changes dramatically with small >> table changes and/or statistics is uncertain. Also this can be used as >> direct "hint" for such dangerous queries by changing GUC for session/single >> query. > ISTM if you add statistics miss and 'risk margin' to the things the > planner would have to consider while generating a plan, you are > greatly increasing the number of plan paths that would have to be > considered for any non trivial query. Why so? I simply change cost estimation functions. This won't change number of pathes. Best regards, Vitalii Tymchyshyn.
Vitalii Tymchyshyn <tivv00@gmail.com> writes: > 24.03.11 20:41, Merlin Moncure �������(��): >> ISTM if you add statistics miss and 'risk margin' to the things the >> planner would have to consider while generating a plan, you are >> greatly increasing the number of plan paths that would have to be >> considered for any non trivial query. > Why so? I simply change cost estimation functions. This won't change > number of pathes. If you have multiple figures of merit, that means you have to keep more paths, with consequent slowdown when it comes to choosing which path to use at higher join levels. As an example, we used to keep only the paths with best total cost. When we started to optimize LIMIT, we had to keep around paths with best startup cost too, in case that made for the best combined result at a higher join level. If you're going to consider "risk" while choosing paths, that means you'll start keeping paths you would have discarded before, while not necessarily getting rid of any other paths. The only way to avoid that would be to have a completely brain-dead notion of risk that wasn't affected by how the path is used at a higher join level, and I'm pretty sure that that wouldn't solve anybody's problem. Any significant expansion of the planner's fundamental cost model *will* make it slower. By a lot. Rather than going into this with fantasies of "it won't cost anything", you should be worrying about how to keep the speed penalty to factor-of-two rather than factor-of-ten. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: >> If the planner starts operating on the basis of worst case rather than >> expected-case performance, the complaints will be far more numerous than >> they are today. > Yeah, I don't think that's the way to go. The other thought I had was > to accumulate a "risk" stat the same as we accumulate a "cost" stat. > However, I'm thinking that I'm overengineering what seems to be a fairly > isolated problem, in that we might simply need to adjust the costing on > this kind of a plan. mergejoinscansel doesn't currently try to fix up the histogram bounds by consulting indexes. At the time I was afraid of the costs of doing that, and I still am; but it would be a way to address this issue. Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: master Release: REL9_0_BR [40608e7f9] 2010-01-04 02:44:40 +0000 When estimating the selectivity of an inequality "column > constant" or "column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup. regards, tom lane
25.03.11 16:12, Tom Lane написав(ла): > Vitalii Tymchyshyn<tivv00@gmail.com> writes: > >> Why so? I simply change cost estimation functions. This won't change >> number of pathes. > If you have multiple figures of merit, that means you have to keep more > paths, with consequent slowdown when it comes to choosing which path to > use at higher join levels. > > As an example, we used to keep only the paths with best total cost. > When we started to optimize LIMIT, we had to keep around paths with best > startup cost too, in case that made for the best combined result at a > higher join level. If you're going to consider "risk" while choosing > paths, that means you'll start keeping paths you would have discarded > before, while not necessarily getting rid of any other paths. The only > way to avoid that would be to have a completely brain-dead notion of > risk that wasn't affected by how the path is used at a higher join > level, and I'm pretty sure that that wouldn't solve anybody's problem. > > Any significant expansion of the planner's fundamental cost model *will* > make it slower. By a lot. Rather than going into this with fantasies > of "it won't cost anything", you should be worrying about how to keep > the speed penalty to factor-of-two rather than factor-of-ten. But I am not talking about model change, it's more like formula change. Introducing limit added one variable where outer plan could influence inner plan selection. But I am talking simply about cost calculation for given node. Now cost is based on statistical expected value, the proposal is (something like) to take maximum cost on n% probability range near expected value. This, of course, will make calculations slower, but won't add any degree of freedom to calculations. Best regards, Vitalii Tymchyshyn
On 3/23/11 2:08 PM, "Claudio Freire" <klaussfreire@gmail.com> wrote: >On Wed, Mar 23, 2011 at 6:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Claudio Freire <klaussfreire@gmail.com> writes: >>> In my head, safer = better worst-case performance. >> >> If the planner starts operating on the basis of worst case rather than >> expected-case performance, the complaints will be far more numerous than >> they are today. > >I imagine, that's why, if you put my comment in context, I was talking >about picking a safer plan only when the "better on average one" fails >miserably. Postgres' assumption about what is 'better on average' is wrong in the presence of nonlinear relationships between various statistics and execution time anyway. AVG(f(x)) != f(AVG(x)) In english, the fastest plan for the average (most likely) case is not always the fastest plan on average. It works very well for many cases, but falls down in others. Many of the 'why is this query slow' and 'I wish there were hints' problems I see here that are not user error seem related to this. The approaches discussed by Nathan Boley and Claudio Freire in this thread could significantly mitigate many of the issues I have seen when wrestling with the planner. > >-- >Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-performance
> mergejoinscansel doesn't currently try to fix up the histogram bounds by > consulting indexes. At the time I was afraid of the costs of doing > that, and I still am; but it would be a way to address this issue. > Another cheaper but less accurate way to deal with this is to note that we are trying to estimate the max of the population by using the max of the sample, which obviously has a negative bias. If we could correct the bias ( though the bootstrap, or an analytical correction under some parametric assumptions ( ie, the distribution is uniform in the last bucket ) ) , then we should get better estimates at the cost of some analyze time. But this wouldn't even deal with Josh's particular problem, since it's due to out of date stats rather than sampling error... -Nathan
> mergejoinscansel doesn't currently try to fix up the histogram bounds > by > consulting indexes. At the time I was afraid of the costs of doing > that, and I still am; but it would be a way to address this issue. Oh? Hmmm. I have a ready-made test case for the benefit case on this. However, I'm not clear on how we would test thecosts. But this type of query plan is clearly pathological, and is experienced by users as a performance regression since 8.3. I now have the user doing analyzes of fairly large tables 2/hour to avoid the problem. So I don't think we can leave italone. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco
On Thu, Mar 24, 2011 at 4:30 PM, Nathan Boley <npboley@gmail.com> wrote: > Another approach, that hasn't been suggested yet, is some Bayesian > update method. There, rather than calculating a specific parameter > value ( like ndistinct ), you try to store the entire distribution and > choose the plan that minimizes cost averaged over all of the possible > parameter values. > > Example: ( please excuse the unrealistic numbers ) > > For instance, rather than estimate the selectivity of the join ( > relation1.col1 = relation2.col1 ) to be 0.01, we would say it is 0.1 > w/ probability 0.2 and 0.001 with probability 0.8. So, here is how we > would choose the plan now: > > cost( nestloop | selectivity = 0.01 ) = 1 > cost( hashjoin | selectivity = 0.01 ) = 2 > cost( mergejoin | selectivity = 0.01 ) = 50 > > Here would be the bayesian approach: > > cost( nestloop | selectivity = 0.001 ) = 0.1 > cost( hashjoin | selectivity = 0.001 ) = 1 > cost( mergejoin | selectivity = 0.001 ) = 50 > > cost( nestloop | selectivity = 0.1 ) = 10 > cost( hashjoin | selectivity = 0.1 ) = 3 > cost( mergejoin | selectivity = 0.1 ) = 50 > > So, the bayesian costs are: > > nestloop: 0.1*0.8 + 10*0.2 = 2.08 > hashjoin: 1.0*0.8 + 3*0.2 = 1.4 > nestloop: 50*0.8 + 50*0.2 = 50 > > so the hashjoin would be chosen. > > For completeness, the minimax costs would be: > > nestloop: max( 0.1, 10 ) > hashjoin: max( 1, 3 ) > nestloop: max( 50, 50 ) > > So, again, the hashjoin is chosen. > > I obviously have a bias towards the Bayesian approach, but it's not > because I expect it to necessarily perform a whole lot better but, > rather, it reduces to the other two approaches. If we want the current > behavior, then simply store the MLE selectivity with probability 1. If > we want the minimax estimate, choose the worst possible value. Or > anything in between. This is a very elegant suggestion to this problem, and I really like it. It elegantly models the concept we're trying to capture here, which is that we're sometimes just guessing how things really are, and if it turns out that we're way off, we may be stuck in a pathologically bad plan. One limitation of this method is that it is difficult to apply more than locally. Consider: SELECT * FROM foo, bar WHERE foo.id = bar.id AND some_crazy_function(foo.id) The best method of joining foo and bar is likely going to depend on the percentage of rows in foo for which some_crazy_function(foo.id) returns true, and we have no idea what that is. We could represent that by kicking out a range of probabilistic *cost* estimates for each path over foo, but that adds a lot of code complexity. And compute time - because (I think) now we've made it so that more paths have to percolate all the way up through the join tree. It's perhaps also worth looking at our old nemesis: SELECT * FROM foo WHERE a = 1 ORDER BY b LIMIT 1 What I really want to do here is have the planner be able to reflect the fact that an index scan over b may be very expensive or very cheap depending on how lucky we get applying the a = 1 predicate, but I'm not quite sure how to make that work. It seems like the time when this would help the most without costing too much or requiring excessively invasive surgery is the case where the join selectivity itself is uncertain. We can estimate that fairly accurately as far as MCVs go, but after that it does get very murky. Still, my gut feeling is that many (not all) of the worst problems actually bubble up from under the join, rather than happening at that level. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 25, 2011 at 10:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >>> If the planner starts operating on the basis of worst case rather than >>> expected-case performance, the complaints will be far more numerous than >>> they are today. > >> Yeah, I don't think that's the way to go. The other thought I had was >> to accumulate a "risk" stat the same as we accumulate a "cost" stat. > >> However, I'm thinking that I'm overengineering what seems to be a fairly >> isolated problem, in that we might simply need to adjust the costing on >> this kind of a plan. > > mergejoinscansel doesn't currently try to fix up the histogram bounds by > consulting indexes. At the time I was afraid of the costs of doing > that, and I still am; but it would be a way to address this issue. Apparently, this is a pain point for the MySQL query planner - not so much for merge joins, which I don't think are supported in any of the major forks anyway - but the planner's desire to go estimate things by probing the indexes. IIRC, the MariaDB guys are looking into adding persistent statistics to address this problem. That doesn't necessarily mean that we shouldn't do this, but it probably does mean that we should be awfully careful about it. Another thought is that we might want to consider reducing autovacuum_analyze_scale_factor. The root of the original problem seems to be that the table had some data churn but not enough to cause an ANALYZE. Now, if the data churn is random, auto-analyzing after 10% churn might be reasonable, but a lot of data churn is non-random, and ANALYZE is fairly cheap. I'm just shooting in the dark here; I might be all wet. I think part of the problem is that the AV launcher isn't very smart about looking at the overall picture. It'd be nice, for example, to be able to be more aggressive when the system is quiet and to be a bit more careful when the system is saturated, but it's a bit tricky to think about how to make that work, or exactly what the heuristics should be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4/19/11 7:29 AM, Robert Haas wrote: > Another thought is that we might want to consider reducing > autovacuum_analyze_scale_factor. The root of the original problem > seems to be that the table had some data churn but not enough to cause > an ANALYZE. Now, if the data churn is random, auto-analyzing after > 10% churn might be reasonable, but a lot of data churn is non-random, > and ANALYZE is fairly cheap. I wouldn't reduce the defaults for PostgreSQL; this is something you do on specific tables. For example, on very large tables I've been known to set analyze_scale_factor to 0 and analyze_threshold to 5000. And don't assume that analyzing is always cheap. If you have an 800GB table, most of which is very cold data, and have statistics set to 5000 for some columns, accessing many of the older blocks could take a while. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Apr 19, 2011, at 9:50 PM, Josh Berkus <josh@agliodbs.com> wrote: > For example, on very large tables I've been known to set > analyze_scale_factor to 0 and analyze_threshold to 5000. Huh? Why? > ...Robert
On Mar 24, 2011, at 5:23 PM, Claudio Freire wrote: > I routinely have to work around query inefficiencies because GEQO does > something odd - and since postgres gives me too few tools to tweak > plans (increase statistics, use subqueries, rephrase joins, no direct > tool before CTEs which are rather new), it becomes an art form, and it > becomes very unpredictable and an administrative burden. Out of the > blue, statistics change, queries that worked fine start to perform > poorly, and sites go down. > > If GEQO could detect unsafe plans and work around them automatically, > it would be a major improvement. This isn't limited to GEQO queries either. Every few months we'll have what should be a very fast query suddenly become farslower. Still on the order of seconds, but when you're running several of those a second and they normally take fractionsof a second, this kind of performance degradation can easily bring a server to it's knees. Every time this has happenedthe solution has been to re-analyze a fairly large table; even with default stats target of 1000 it's very easy forone bad analyze to ruin your day. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net