Re: pg_stat_io for the startup process

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: pg_stat_io for the startup process
Дата
Msg-id CAAKRu_b9FVaBaOK7xfKoSD8-X_M0XBKmXS14nEatciGOM96+Kg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_stat_io for the startup process  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
I've reviewed both your patch versions and responded to the ideas in
both of them and the associated emails below.

On Wed, Apr 26, 2023 at 5:47 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Tue, 25 Apr 2023 16:04:23 -0700, Andres Freund <andres@anarazel.de> wrote in
> > key those stats by oid. However, it *does* report the read/write time. But
> > only at process exit, of course.  The weird part is that the startup process
> > does *NOT* increase pg_stat_database.blks_read/blks_hit, because instead of
> > basing those on pgBufferUsage.shared_blks_read etc, we compute them based on
> > the relation level stats. pgBufferUsage is just used for EXPLAIN.  This isn't
> > recent, afaict.
>
> I see four issues here.
>
> 1. The current database stats omits buffer fetches that don't
>   originate from a relation.
>
> In this case pgstat_relation can't work since recovery isn't conscious
> of relids.  We might be able to resolve relfilenode into a relid, but
> it may not be that simple. Fortunately we already count fetches and
> hits process-wide using pgBufferUsage, so we can use this for database
> stats.
...
> > TL;DR: Currently the startup process maintains blk_read_time, blk_write_time,
> > but doesn't maintain blks_read, blks_hit - which doesn't make sense.
>
> As a result, the attached patch, which is meant for discussion, allows
> pg_stat_database to show fetches and reads by the startup process as
> the counts for the database with id 0.

I would put this in its own patch in a patchset. Of course it relies on
having pgstat_report_stat() called at appropriate times by the startup
process, but having pg_stat_database show read/hit counts is a separate
issue than having pg_stat_io do so. I'm not suggesting we do this, but
you could argue that if we fix the startup process stats reporting that
pg_stat_database not showing reads and hits for the startup process is a
bug that also exists in 15.

Not directly related, but I do not get why the existing stats counters
for pg_stat_database count "fetches" and "hits" and then use subtraction
to get the number of reads. I find it confusing and seems like it could
lead to subtle inconsistencies with those counters counting reads closer
to where they are actually happening.

> 2. Even if we wanted to report stats for the startup process,
>   pgstat_report_stats wouldn't permit it since transaction-end
>   timestamp doesn't advance.
>
> I'm not certain if it's the correct approach, but perhaps we could use
> GetCurrentTimestamp() instead of GetCurrentTransactionStopTimestamp()
> specifically for the startup process.

In theory, since all of the approaches proposed in this thread would
exercise rigid control over how often we flush stats in the startup
process, I think it is okay to use GetCurrentTimestamp() when
pgstat_report_stat() is called by the startup process (i.e. we don't
have to worry about overhead of doing it). But looking at it implemented
in the patch made me feel unsettled for some reason.

> 3. When should we call pgstat_report_stats on the startup process?
>
> During recovery, I think we can call pgstat_report_stats() (or a
> subset of it) right before invoking WaitLatch and at segment
> boundaries.

I see in the patch you call pgstat_report_stat() in XLogPageRead(). Will
this only be called on segment boundaries?

> 4. In the existing ReadBuffer_common, there's an inconsistency in
> counting hits and reads between pgstat_io and pgBufferUsage.
>
> The difference comes from the case of RBM_ZERO pages. We should simply
> align them.

I would definitely make this a separate patch and probably a separate
thread. It isn't related to the startup process and is worth a separate
discussion.

On Thu, Apr 27, 2023 at 10:43 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Fri, 28 Apr 2023 11:15:51 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > At Thu, 27 Apr 2023 17:30:40 -0400, Melanie Plageman <melanieplageman@gmail.com> wrote in
> > > After a quick example implementation of this, I found that it seemed to
> > > try and flush the stats less often on an idle standby (good) than using
> > > enable_timeout_every().
> >
> > Just rearming with the full-interval will work that way. Our existing
> > strategy for this is seen in PostgresMain().
> >
> >    stats_timeout = pgstat_report_stat(false);
> >    if (stats_timeout > 0)
> >    {
> >       if (!get_timeout_active(BLAH_TIMEOUT))
> >           enable_timeout_after(BLAH_TIMEOUT, stats_timeout);
> >    }
> >    else
> >    {
> >        if (get_timeout_active(BLAH_TIMEOUT))
> >            disable_timeout(BLAH_TIMEOUT, false);
> >    }
> >    WaitLatch();
>
> Im my example, I left out idle-time flushing, but I realized we don't
> need the timeout mechanism here since we're already managing it. So
> the following should work (assuming the timestamp updates with
> GetCurrentTimestamp() in my last patch).
>
> @@ -3889,13 +3900,23 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
>                                         /* Update pg_stat_recovery_prefetch before sleeping. */
>                                         XLogPrefetcherComputeStats(xlogprefetcher);
>
> +                                       /*
> +                                        * Report stats; if not time yet, set next WaitLatch to
> +                                        * wake up at the next reporing time.
> +                                        */
> +                                       wait_time = pgstat_report_stat(false);
> +
> +                                       /* if no pending stats, sleep forever */
> +                                       if (wait_time == 0)
> +                                               wait_time = -1L;
> +
>                                         /*
>                                          * Wait for more WAL to arrive, when we will be woken
>                                          * immediately by the WAL receiver.
>                                          */
>                                         (void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
>                                                                          WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
> -                                                                        -1L,
> +                                                                        wait_time,
>                                                                          WAIT_EVENT_RECOVERY_WAL_STREAM);

The idle-time flushing is a great point I did not think of. I think
Andres did have some concern with unconditionally calling
pgstat_report_stat() in WaitForWalToBecomeAvailable() before WaitLatch()
-- I believe because it would be called too often and attempt flushing
multiple times between HandleStartupProcInterrupts().

- Melanie



В списке pgsql-hackers по дате отправления:

Предыдущее
От: gkokolatos@pm.me
Дата:
Сообщение: Re: Add LZ4 compression in pg_dump
Следующее
От: Andres Freund
Дата:
Сообщение: Re: issue with meson builds on msys2