Обсуждение: Could ANALYZE estimate bloat?
Hackers, I've been tinkering with a number of table bloat checks, and it occurred to me that the problm is that these are all approximations based on overall gross statistics, and as such highly inaccurate. It seems like would could have ANALYZE, while sampling from the table, also check the amount of dead space per page and use that as an estimate of the % of dead space overall. We'd still need something else for indexes, but this seems like it would be a good start. No? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 09/20/2013 11:59 AM, Josh Berkus wrote: > > Hackers, > > I've been tinkering with a number of table bloat checks, and it occurred > to me that the problm is that these are all approximations based on > overall gross statistics, and as such highly inaccurate. > > It seems like would could have ANALYZE, while sampling from the table, > also check the amount of dead space per page and use that as an estimate > of the % of dead space overall. We'd still need something else for > indexes, but this seems like it would be a good start. > > No? I think this is a great idea. > -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
Josh, * Josh Berkus (josh@agliodbs.com) wrote: > I've been tinkering with a number of table bloat checks, and it occurred > to me that the problm is that these are all approximations based on > overall gross statistics, and as such highly inaccurate. Greg Smith and I discussed some improvements around this area @Open. I'd suggest you talk with him about the ideas that he has. > It seems like would could have ANALYZE, while sampling from the table, > also check the amount of dead space per page and use that as an estimate > of the % of dead space overall. We'd still need something else for > indexes, but this seems like it would be a good start. > > No? Err, I thought that was one of the things like ANALYZE *did* collect through the 'live tuples' number? Thanks, Stephen
On 09/20/2013 02:17 PM, Stephen Frost wrote: >> No? > > Err, I thought that was one of the things like ANALYZE *did* collect > through the 'live tuples' number? Nope. "live tuples" is updated by the stats collector, NOT by analyze.Also, live vs. dead tuples doesn't tell me how muchfree *space* is available on pages. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > Also, live vs. dead tuples doesn't tell me how much free *space* is > available on pages. I'm not really sure that you'd get much better from ANALYZE than you get from tracking the inserted/updated/deleted tuples. Collecting that information when VACUUM'ing the table would certainly provide much more accurate results, which could possibly be stored in a page-level bitmap of "completely empty pages" at the beginning of each 1G segment. Alternatively, the bitmap could be updated during processing instead of waiting for a VACUUM. Greg and I hypothesized that such a bitmap might be used to truncate individual 1G segments in the middle of a relation rather than only at the end, perhaps all the way down to a point where only a header plus the page-level bitmap in the 1G segment are left. This was discussed in context of a VACUUM which used the try-to-elevate-the-lock approach already used to truncate the last 1G segment of the relations, though I've also wondered if it could take page-level locks starting at the end of the 1G segment and walking backwards until it's unable to acquire a lock or a non-empty page is found. Of course, we're aware of the issues around the storage management system and interfaces which might make this entirely unrealistic but it's getting to a point where, I think (not sure about Greg), we need to deal with that in some way to improve on issues like this. Thanks, Stephen