pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
От | Andres Freund |
---|---|
Тема | pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Дата | |
Msg-id | 20200124195226.lth52iydq2n2uilq@alap3.anarazel.de обсуждение исходный текст |
Ответы |
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
(Magnus Hagander <magnus@hagander.net>)
|
Список | pgsql-hackers |
Hi, Currently pg_stat_bgwriter.buffers_backend is pretty useless to gauge whether backends are doing writes they shouldn't do. That's because it counts things that are either unavoidably or unlikely doable by other parts of the system (checkpointer, bgwriter). In particular extending the file can not currently be done by any another type of process, yet is counted. When using a buffer access strategy it is also very likely that writes have to be done by the 'dirtying' backend itself, as the buffer will be reused soon after (when not previously in s_b that is). Additionally pg_stat_bgwriter.buffers_backend also counts writes done by autovacuum et al. I think it'd make sense to at least split buffers_backend into buffers_backend_extend, buffers_backend_write, buffers_backend_write_strat but it could also be worthwhile to expand it into buffers_backend_extend, buffers_{backend,checkpoint,bgwriter,autovacuum}_write buffers_{backend,autovacuum}_write_stat Possibly by internally, in contrast to SQL level, having just counter arrays indexed by backend types. It's also noteworthy that buffers_backend is accounted in an absurd manner. One might think that writes are accounted from backend -> shared memory or such. But instead it works like this: 1) backend flushes buffer in bufmgr.c, accounts for backend *write time* 2) mdwrite writes and registers a sync request, which forwards the sync request to checkpointer 3) ForwardSyncRequest(), when not called by bgwriter, increments CheckpointerShmem->num_backend_writes 4) checkpointer, whenever doing AbsorbSyncRequests(), moves CheckpointerShmem->num_backend_writes to BgWriterStats.m_buf_written_backend (local memory!) 5) Occasionally it calls pgstat_send_bgwriter(), which sends the data to pgstat (which bgwriter also does) 6) Which then updates the shared memory used by the display functions Worthwhile to note that backend buffer read/write *time* is accounted differently. That's done via pgstat_send_tabstat(). I think there's very little excuse for the indirection via checkpointer, besides architectually being weird, it actually requires that we continue to wake up checkpointer over and over instead of optimizing how and when we submit fsync requests. As far as I can tell we're also simply not accounting at all for writes done outside of shared buffers. All writes done directly through smgrwrite()/extend() aren't accounted anywhere as far as I can tell. I think we also count things as writes that aren't writes: mdtruncate() is AFAICT counted as one backend write for each segment. Which seems weird to me. Lastly, I don't understand what the point of sending fixed size stats, like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While I don't like it's architecture, we obviously need something like pgstat to handle variable amounts of stats (database, table level etc stats). But that doesn't at all apply to these types of global stats. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Mark DilgerДата:
Сообщение: Re: making the backend's json parser work in frontend code
Следующее
От: Ranier VilelaДата:
Сообщение: [PATCH] /src/backend/access/transam/xlog.c, tiny improvements