Re: Add index scan progress to pg_stat_progress_vacuum

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Add index scan progress to pg_stat_progress_vacuum
Дата
Msg-id CAD21AoBW6SMJ96CNoMeu+f_BR4jmatPcfVA016FdD2hkLDsaTA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Ответы Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Список pgsql-hackers
On Wed, Mar 9, 2022 at 12:41 AM Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> +    +/*
> +    + * vacuum_worker_init --- initialize this module's shared memory hash
> +    + * to track the progress of a vacuum worker
> +    + */
> +    +void
> +    +vacuum_worker_init(void)
> +    +{
> +    +       HASHCTL     info;
> +    +       long        max_table_size = GetMaxBackends();
> +    +
> +    +       VacuumWorkerProgressHash = NULL;
> +    +
> +    +       info.keysize = sizeof(pid_t);
> +    +       info.entrysize = sizeof(VacProgressEntry);
> +    +
> +    +       VacuumWorkerProgressHash = ShmemInitHash("Vacuum Progress Hash",
> +    +
> +                      max_table_size,
> +    +
> +                      max_table_size,
> +    +
> +                      &info,
> +    +
> +                      HASH_ELEM | HASH_BLOBS);
> +    +}
>
> +    It seems to me that creating a shmem hash with max_table_size entries
> +    for parallel vacuum process tracking is too much. IIRC an old patch
> +    had parallel vacuum workers advertise its progress and changed the
> +    pg_stat_progress_vacuum view so that it aggregates the results
> +    including workers' stats. I think it’s better than the current one.
> +    Why did you change that?
>
> +    Regards,
>
> I was trying to avoid a shared memory to track completed indexes, but aggregating stats does not work with parallel
vacuums.This is because a parallel worker will exit before the vacuum completes causing the aggregated total to be
wrong.
>
> For example
>
> Leader_pid advertises it completed 2 indexes
> Parallel worker advertises it completed 2 indexes
>
> When aggregating we see 4 indexes completed.
>
> After the parallel worker exits, the aggregation will show only 2 indexes completed.

Indeed.

It might have already been discussed but other than using a new shmem
hash for parallel vacuum, I wonder if we can allow workers to change
the leader’s progress information. It would break the assumption that
the backend status entry is modified by its own backend, though. But
it might help for progress updates of other parallel operations too.
This essentially does the same thing as what the current patch does
but it doesn't require a new shmem hash.

Another idea I come up with is that the parallel vacuum leader checks
PVIndStats.status and updates how many indexes are processed to its
progress information. The leader can check it and update the progress
information before and after index vacuuming. And possibly we can add
a callback to the main loop of index AM's bulkdelete and vacuumcleanup
so that the leader can periodically make it up-to-date.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Naming of the different stats systems / "stats collector"
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Allow async standbys wait for sync replication