Обсуждение: Re: pgsql: Use pre-fetching for ANALYZE

Поиск
Список
Период
Сортировка

Re: pgsql: Use pre-fetching for ANALYZE

От
Andres Freund
Дата:
Hi,

On 2021-03-16 18:48:08 +0000, Stephen Frost wrote:
> Use pre-fetching for ANALYZE
> 
> When we have posix_fadvise() available, we can improve the performance
> of an ANALYZE by quite a bit by using it to inform the kernel of the
> blocks that we're going to be asking for.  Similar to bitmap index
> scans, the number of buffers pre-fetched is based off of the
> maintenance_io_concurrency setting (for the particular tablespace or,
> if not set, globally, via get_tablespace_maintenance_io_concurrency()).

I just looked at this as part of debugging a crash / hang in the AIO patch.

The code does:

        block_accepted = table_scan_analyze_next_block(scan, targblock, vac_strategy);

#ifdef USE_PREFETCH

        /*
         * When pre-fetching, after we get a block, tell the kernel about the
         * next one we will want, if there's any left.
         *
         * We want to do this even if the table_scan_analyze_next_block() call
         * above decides against analyzing the block it picked.
         */
        if (prefetch_maximum && prefetch_targblock != InvalidBlockNumber)
            PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_targblock);
#endif

I.e. we lock a buffer and *then* we prefetch another buffer. That seems like a
quite bad idea to me. Why are we doing IO while holding a content lock, if we
can avoid it?

Greetings,

Andres Freund



Re: pgsql: Use pre-fetching for ANALYZE

От
Andres Freund
Дата:
Hi,

On 2022-06-02 19:30:16 -0700, Andres Freund wrote:
> On 2021-03-16 18:48:08 +0000, Stephen Frost wrote:
> > Use pre-fetching for ANALYZE
> > 
> > When we have posix_fadvise() available, we can improve the performance
> > of an ANALYZE by quite a bit by using it to inform the kernel of the
> > blocks that we're going to be asking for.  Similar to bitmap index
> > scans, the number of buffers pre-fetched is based off of the
> > maintenance_io_concurrency setting (for the particular tablespace or,
> > if not set, globally, via get_tablespace_maintenance_io_concurrency()).
> 
> I just looked at this as part of debugging a crash / hang in the AIO patch.
> 
> The code does:
> 
>         block_accepted = table_scan_analyze_next_block(scan, targblock, vac_strategy);
> 
> #ifdef USE_PREFETCH
> 
>         /*
>          * When pre-fetching, after we get a block, tell the kernel about the
>          * next one we will want, if there's any left.
>          *
>          * We want to do this even if the table_scan_analyze_next_block() call
>          * above decides against analyzing the block it picked.
>          */
>         if (prefetch_maximum && prefetch_targblock != InvalidBlockNumber)
>             PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_targblock);
> #endif
> 
> I.e. we lock a buffer and *then* we prefetch another buffer. That seems like a
> quite bad idea to me. Why are we doing IO while holding a content lock, if we
> can avoid it?

It also seems decidedly not great from a layering POV to do the IO in
analyze.c. There's no guarantee that the tableam maps blocks in a way that's
compatible with PrefetchBuffer().  Yes, the bitmap heap scan code does
something similar, but a) that is opt in by the AM, b) there's a comment
saying it's quite crufty and should be fixed.

Greetings,

Andres Freund



Re: pgsql: Use pre-fetching for ANALYZE

От
Stephen Frost
Дата:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2022-06-02 19:30:16 -0700, Andres Freund wrote:
> > On 2021-03-16 18:48:08 +0000, Stephen Frost wrote:
> > > Use pre-fetching for ANALYZE
> > >
> > > When we have posix_fadvise() available, we can improve the performance
> > > of an ANALYZE by quite a bit by using it to inform the kernel of the
> > > blocks that we're going to be asking for.  Similar to bitmap index
> > > scans, the number of buffers pre-fetched is based off of the
> > > maintenance_io_concurrency setting (for the particular tablespace or,
> > > if not set, globally, via get_tablespace_maintenance_io_concurrency()).
> >
> > I just looked at this as part of debugging a crash / hang in the AIO patch.
> >
> > The code does:
> >
> >         block_accepted = table_scan_analyze_next_block(scan, targblock, vac_strategy);
> >
> > #ifdef USE_PREFETCH
> >
> >         /*
> >          * When pre-fetching, after we get a block, tell the kernel about the
> >          * next one we will want, if there's any left.
> >          *
> >          * We want to do this even if the table_scan_analyze_next_block() call
> >          * above decides against analyzing the block it picked.
> >          */
> >         if (prefetch_maximum && prefetch_targblock != InvalidBlockNumber)
> >             PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_targblock);
> > #endif
> >
> > I.e. we lock a buffer and *then* we prefetch another buffer. That seems like a
> > quite bad idea to me. Why are we doing IO while holding a content lock, if we
> > can avoid it?

At the end, we're doing a posix_fadvise() which is a kernel call but
hopefully wouldn't do actual IO when we call it.  Still, agreed that
it'd be better to do that without holding locks and no objection to
making such a change.

> It also seems decidedly not great from a layering POV to do the IO in
> analyze.c. There's no guarantee that the tableam maps blocks in a way that's
> compatible with PrefetchBuffer().  Yes, the bitmap heap scan code does
> something similar, but a) that is opt in by the AM, b) there's a comment
> saying it's quite crufty and should be fixed.

Certainly open to suggestions.  Are you thinking it'd make sense to add
a 'prefetch_block' method to TableAmRoutine?  Or did you have another
thought?

Thanks!

Stephen

Вложения