Обсуждение: More problems with VacuumPageHit style global variables
My recent bugfix commit d3609dd2 addressed an issue with VACUUM VERBOSE. It would aggregate buffers hit/missed/dirtied counts incorrectly (by double counting), though only when there are multiple heap rels processed by the same VACUUM command. It failed to account for the fact that the VacuumPageHit, VacuumPageMiss, and VacuumPageDirty global variables were really only designed to work in autovacuum (see 2011 commit 9d3b5024). I just realized that there is one remaining problem: parallel VACUUM doesn't care about these global variables, so there will still be discrepancies there. I can't really blame that on parallel VACUUM, though, because vacuumparallel.c at least copies buffer usage counters from parallel workers (stored in its PARALLEL_VACUUM_KEY_BUFFER_USAGE space). I wonder why we still have these seemingly redundant global variables, which are maintained by bufmgr.c (alongside the pgBufferUsage stuff). It looks like recent commit 5dc0418fab ("Prefetch data referenced by the WAL, take II") taught bufmgr.c to instrument all buffer accesses. So it looks like we just don't need VacuumPageHit and friends anymore. Wouldn't it be better if every VACUUM used the same generic approach, using pgBufferUsage? As a general rule code that only runs in autovacuum is a recipe for bugs. It looks like VacuumPageHit is maintained based on different rules to pgBufferUsage.shared_blks_hit in bufmgr.c (just as an example), which seems like a bad sign. Besides, the pgBufferUsage counters have more information, which seems like it might be useful to the lazyvacuum.c instrumentation. One question for the author of the WAL prefetch patch, Thomas (CC'd): It's not 100% clear what the expectation is with pgBufferUsage when track_io_timing is off, so are fields like pgBufferUsage.shared_blks_hit (i.e. those that don't have a time/duration component) officially okay to rely on across the board? It looks like they are okay to rely on (even when track_io_timing is off), but it would be nice to put that on a formal footing, if it isn't already. -- Peter Geoghegan
On Thu, Apr 21, 2022 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > I just realized that there is one remaining problem: parallel VACUUM > doesn't care about these global variables, so there will still be > discrepancies there. I can't really blame that on parallel VACUUM, > though, because vacuumparallel.c at least copies buffer usage counters > from parallel workers (stored in its PARALLEL_VACUUM_KEY_BUFFER_USAGE > space). I wonder why we still have these seemingly redundant global > variables, which are maintained by bufmgr.c (alongside the > pgBufferUsage stuff). It looks like recent commit 5dc0418fab > ("Prefetch data referenced by the WAL, take II") taught bufmgr.c to > instrument all buffer accesses. So it looks like we just don't need > VacuumPageHit and friends anymore. Yeah, that sounds right. > Wouldn't it be better if every VACUUM used the same generic approach, > using pgBufferUsage? As a general rule code that only runs in > autovacuum is a recipe for bugs. It looks like VacuumPageHit is > maintained based on different rules to pgBufferUsage.shared_blks_hit > in bufmgr.c (just as an example), which seems like a bad sign. > Besides, the pgBufferUsage counters have more information, which seems > like it might be useful to the lazyvacuum.c instrumentation. +1 > One question for the author of the WAL prefetch patch, Thomas (CC'd): > It's not 100% clear what the expectation is with pgBufferUsage when > track_io_timing is off, so are fields like > pgBufferUsage.shared_blks_hit (i.e. those that don't have a > time/duration component) officially okay to rely on across the board? > It looks like they are okay to rely on (even when track_io_timing is > off), but it would be nice to put that on a formal footing, if it > isn't already. Right, that commit did this, plus the local variant: @@ -680,6 +682,8 @@ ReadRecentBuffer(RelFileNode rnode, ForkNumber forkNum, BlockNumber blockNum, else PinBuffer_Locked(bufHdr); /* pin for first time */ + pgBufferUsage.shared_blks_hit++; + return true; } I should perhaps have committed those changes separately with their own explanation, since it was really an oversight in commit 2f27f8c5114 that this type of hit wasn't counted (as noted by Julien in review of the WAL prefetcher). I doubt anyone else has discovered that function, which has no caller in PG14. As for your general question, I think you must be right. From a quick rummage around in the commit log, it does appear that commit cddca5ec (2009), which introduced pgBufferUsage, always bumped the counters unconditionally. It predated track_io_timing by years (40b9b957694 (2012)), and long before that the Berkeley code already had a simpler thing along those lines (ReadBufferCount, BufferHitCount etc). I didn't look up the discussion, but I wonder if the reason commit 9d3b5024435 (2011) introduced VacuumPage{Hit,Miss,Dirty} instead of measuring level changes in pgBufferUsage is that pgBufferUsage didn't have a dirty count until commit 2254367435f (2012), and once the authors had decided they'd need a new special counter for that, they continued down that path and added the others too?
On Wed, Apr 20, 2022 at 7:50 PM Thomas Munro <thomas.munro@gmail.com> wrote: > As for your general question, I think you must be right. From a quick > rummage around in the commit log, it does appear that commit cddca5ec > (2009), which introduced pgBufferUsage, always bumped the counters > unconditionally. It predated track_io_timing by years (40b9b957694 > (2012)), and long before that the Berkeley code already had a simpler > thing along those lines (ReadBufferCount, BufferHitCount etc). I > didn't look up the discussion, but I wonder if the reason commit > 9d3b5024435 (2011) introduced VacuumPage{Hit,Miss,Dirty} instead of > measuring level changes in pgBufferUsage is that pgBufferUsage didn't > have a dirty count until commit 2254367435f (2012), and once the > authors had decided they'd need a new special counter for that, they > continued down that path and added the others too? I knew about pgBufferUsage, and I knew about VacuumPage{Hit,Miss,Dirty} for a long time. But somehow I didn't make the very obvious connection between the two until today. I am probably not the only one. -- Peter Geoghegan
On Wed, Apr 20, 2022 at 8:00 PM Peter Geoghegan <pg@bowt.ie> wrote: > I knew about pgBufferUsage, and I knew about > VacuumPage{Hit,Miss,Dirty} for a long time. But somehow I didn't make > the very obvious connection between the two until today. I am probably > not the only one. What about pgStatBlockWriteTime/pgstat_count_buffer_write_time(), which also seem redundant? These were added by commit 64482890 back in 2012 (though under slightly different names), and are still used today by code that aggregates database-level timing stats -- see pgstat_update_dbstats(). Code like FlushBuffer() maintains both pgStatBlockWriteTime and pgBufferUsage.blk_write_time (iff track_io_timing is on). So looking at both the consumer side and the produce side makes it no more clear why both are needed. I suspect pgStatBlockWriteTime exists because of a similar kind of historic confusion, or losing track of things. There are similar-looking variables named things like pgStatXactCommit, which are not redundant (since pgBufferUsage doesn't have any of that, just granular I/O timing stuff). It would have been easy to miss the fact that only a subset of these pgStat* variables were redundant. Also seems possible that there was confusion about which variable was owned by what subsystem, with the pgStat* stuff appearing to be a stats collector thing, while pgBufferUsage appeared to be an executor thing. I don't think that there is any risk of one user of either variable "clobbering" some other user -- the current values of the variables are not actually meaningful at all. They're only useful as a way that an arbitrary piece of code instruments an arbitrary operation, by making their own copies, running whatever the operation is, and then reporting on the deltas. Which makes it even more surprising that this was overlooked until now. -- Peter Geoghegan
On Thu, Apr 21, 2022 at 4:28 PM Peter Geoghegan <pg@bowt.ie> wrote: > I don't think that there is any risk of one user of either variable > "clobbering" some other user -- the current values of the variables > are not actually meaningful at all. They're only useful as a way that > an arbitrary piece of code instruments an arbitrary operation, by > making their own copies, running whatever the operation is, and then > reporting on the deltas. Which makes it even more surprising that this > was overlooked until now. I suppose code like pgstat_update_dbstats() would need to copy pgBufferUsage somewhere if we were to get rid of pgStatBlockReadTime and pgStatBlockWriteTime. That might not have been acceptable back when we had the old stats collector; frequent copying of pgBufferUsage might have non-trivial overhead. The relevant struct (BufferUsage) has over 10 64-bit integers, versus only 2 for pgStatBlockReadTime and pgStatBlockWriteTime. But does that matter anymore now that we have the cumulative stats system? Doesn't the redundancy seem like a problem? -- Peter Geoghegan