Re: Add index scan progress to pg_stat_progress_vacuum

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Add index scan progress to pg_stat_progress_vacuum
Дата
Msg-id CAD21AoAjSYmTWgWvuJhDa7Dwi0wZLJBh+1irybjCLRVh6jvJKg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Ответы Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Список pgsql-hackers
On Tue, Dec 13, 2022 at 1:40 PM Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> Thanks for the feedback. I agree with the feedback, except
> for
>
> >    need to have ParallelVacuumProgress. I see
> >    parallel_vacuum_update_progress() uses this value but I think it's
> >    better to pass ParallelVacuumState to via IndexVacuumInfo.
>
> I was trying to avoid passing a pointer to
> ParallelVacuumState in IndexVacuuminfo.
>
> ParallelVacuumProgress is implemented in the same
> way as VacuumSharedCostBalance and
> VacuumActiveNWorkers. See vacuum.h
>
> These values are reset at the start of a parallel vacuum cycle
> and reset at the end of an index vacuum cycle.
>
> This seems like a better approach and less invasive.
> What would be a reason not to go with this approach?

First of all, I don't think we need to declare ParallelVacuumProgress
in vacuum.c since it's set and used only in vacuumparallel.c. But I
don't even think it's a good idea to declare it in vacuumparallel.c as
a static variable. The primary reason is that it adds things we need
to care about. For example, what if we raise an error during index
vacuum? The transaction aborts but ParallelVacuumProgress still refers
to something old. Suppose further that the next parallel vacuum
doesn't launch any workers, the leader process would still end up
accessing the old value pointed by ParallelVacuumProgress, which
causes a SEGV. So we need to reset it anyway at the beginning of the
parallel vacuum. It's easy to fix at this time but once the parallel
vacuum code gets more complex, it could forget to care about it.

IMO VacuumSharedCostBalance and VacuumActiveNWorkers have a different
story. They are set in vacuumparallel.c and are used in vacuum.c for
vacuum delay. If they weren't global variables, we would need to pass
them to every function that could eventually call the vacuum delay
function. So it makes sense to me to have them as global variables.On
the other hand, for ParallelVacuumProgress, it's a common pattern that
ambulkdelete(), amvacuumcleanup() or a common index scan routine like
btvacuumscan() checks the progress. I don't think index AM needs to
pass the value down to many of its functions. So it makes sense to me
to pass it via IndexVacuumInfo.

Having said that, I'd like to hear opinions also from other hackers, I
might be wrong and it's more invasive as you pointed out.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Time delayed LR (WAS Re: logical replication restrictions)
Следующее
От: Jeff Davis
Дата:
Сообщение: Avoid extra "skipping" messages from VACUUM/ANALYZE