Re: [WIP] cache estimates, cache access cost
От | Robert Haas |
---|---|
Тема | Re: [WIP] cache estimates, cache access cost |
Дата | |
Msg-id | BANLkTi=bJTxii58qYekMkRr7+bhx20B4kA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [WIP] cache estimates, cache access cost (Cédric Villemain <cedric.villemain.debian@gmail.com>) |
Ответы |
Re: [WIP] cache estimates, cache access cost
|
Список | pgsql-hackers |
On Tue, Jun 14, 2011 at 12:06 PM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: >> 1. ANALYZE happens far too infrequently to believe that any data taken >> at ANALYZE time will still be relevant at execution time. > > ANALYZE happens when people execute it, else it is auto-analyze and I > am not providing auto-analyze-oscache. > ANALYZE OSCACHE is just a very simple wrapper to update pg_class. The > frequency is not important here, I believe. Well, I'm not saying you have to have all the answers to post a WIP patch, certainly. But in terms of getting something committable, it seems like we need to have at least an outline of what the long-term plan is. If ANALYZE OSCACHE is an infrequent operation, then the data isn't going to be a reliable guide to what will happen at execution time... >> 2. Using data gathered by ANALYZE will make plans less stable, and our >> users complain not infrequently about the plan instability we already >> have, therefore we should not add more. ...and if it is a frequent operation then it's going to result in unstable plans (and maybe pg_class bloat). There's a fundamental tension here that I don't think you can just wave your hands at. > I was trying to split the patch size by group of features to reduce > its size. The work is in progress. Totally reasonable, but I can't see committing any of it without some evidence that there's light at the end of the tunnel. No performance tests *whatsoever* have been done. We can debate the exact amount of evidence that should be required to prove that something is useful from a performance perspective, but we at least need some. I'm beating on this point because I believe that the whole idea of trying to feed this information back into the planner is going to turn out to be something that we don't want to do. I think it's going to turn out to have downsides that are far larger than the upsides. I am completely willing to be be proven wrong, but right now I think this will make things worse and you think it will make things better and I don't see any way to bridge that gap without doing some measurements. For example, if you run this patch on a system and subject that system to a relatively even workload, how much do the numbers bounce around between runs? What if you vary the workload, so that you blast it with OLTP traffic at some times and then run reporting queries at other times? Or different tables become hot at different times? Once you've written code to make the planner do something with the caching % values, then you can start to explore other questions. Can you generate plan instability, especially on complex queries, which are more prone to change quickly based on small changes in the cost estimates? Can you demonstrate a workload where bad performance is inevitable with the current code, but with your code, the system becomes self-tuning and ends up with good performance? What happens if you have a large cold table with a small hot end where all activity is concentrated? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: