Re: Add index scan progress to pg_stat_progress_vacuum
От | Imseih (AWS), Sami |
---|---|
Тема | Re: Add index scan progress to pg_stat_progress_vacuum |
Дата | |
Msg-id | 04D9F80C-1A7C-4C23-9156-6799D974DC72@amazon.com обсуждение исходный текст |
Ответ на | Re: Add index scan progress to pg_stat_progress_vacuum ("Imseih (AWS), Sami" <simseih@amazon.com>) |
Список | pgsql-hackers |
Attached is the latest revision of the patch. In "pg_stat_progress_vacuum", introduce 2 columns: * total_index_vacuum : This is the # of indexes that will be vacuumed. Keep in mind that if failsafe mode kicks in mid-flightto the vacuum, Postgres may choose to forgo index scans. This value will be adjusted accordingly. * max_index_vacuum_cycle_time : The total elapsed time for a index vacuum cycle is calculated and this value will be updatedto reflect the longest vacuum cycle. Until the first cycle completes, this value will be 0. The purpose of this columnis to give the user an idea of how long an index vacuum cycle takes to complete. postgres=# \d pg_stat_progress_vacuum View "pg_catalog.pg_stat_progress_vacuum" Column | Type | Collation | Nullable | Default -----------------------------+---------+-----------+----------+--------- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | phase | text | | | heap_blks_total | bigint | | | heap_blks_scanned | bigint | | | heap_blks_vacuumed | bigint | | | index_vacuum_count | bigint | | | max_dead_tuples | bigint | | | num_dead_tuples | bigint | | | total_index_vacuum | bigint | | | max_index_vacuum_cycle_time | bigint | | | Introduce a new view called "pg_stat_progress_vacuum_index". This view will track the progress of a worker ( or leader PID) while it's vacuuming an index. It will expose some key columns: * pid: The PID of the worker process * leader_pid: The PID of the leader process. This is the column that can be joined with "pg_stat_progress_vacuum". leader_pidand pid can have the same value as a leader can also perform an index vacuum. * indrelid: The relid of the index currently being vacuumed * vacuum_cycle_ordinal_position: The processing position of the index being vacuumed. This can be useful to determine howmany indexes out of the total indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed * index_tuples_vacuumed: This is the number of index tuples vacuumed for the index overall. This is useful to show that thevacuum is actually doing work, as the # of tuples keeps increasing. postgres=# \d pg_stat_progress_vacuum_index View "pg_catalog.pg_stat_progress_vacuum_index" Column | Type | Collation | Nullable | Default -------------------------------+---------+-----------+----------+--------- pid | integer | | | leader_pid | bigint | | | indrelid | bigint | | | vacuum_cycle_ordinal_position | bigint | | | index_tuples_vacuumed | bigint | | | On 12/27/21, 6:12 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote: I do agree that tracking progress by # of blocks scanned is not deterministic for all index types. Based on this feedback, I went back to the drawing board on this. Something like below may make more sense. In pg_stat_progress_vacuum, introduce 2 new columns: 1. total_index_vacuum - total # of indexes to vacuum 2. max_cycle_time - the time in seconds of the longest index cycle. Introduce another view called pg_stat_progress_vacuum_index_cycle: postgres=# \d pg_stat_progress_vacuum_index_cycle View "public.pg_stat_progress_vacuum_worker" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+--------- pid | integer | | | <<<-- the PID of the vacuum worker ( or leader if it'sdoing index vacuuming ) leader_pid | bigint | | | <<<-- the leader PID to allow this view to be joinedback to pg_stat_progress_vacuum indrelid | bigint | | | <<<- the index relid of the index being vacuumed ordinal_position | bigint | | | <<<- the processing position, which will give an ideaof the processing position of the index being vacuumed. dead_tuples_removed | bigint | | <<<- the number of dead rows removed in the current cycle for the index. Having this information, one can 1. Determine which index is being vacuumed. For monitoring tools, this can help identify the index that accounts formost of the index vacuuming time. 2. Having the processing order of the current index will allow the user to determine how many of the total indexes hasbeen completed in the current cycle. 3. dead_tuples_removed will show progress on the index vacuum in the current cycle. 4. the max_cycle_time will give an idea on how long the longest index cycle took for the current vacuum operation. On 12/23/21, 2:46 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. On Tue, Dec 21, 2021 at 3:37 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan <bossartn@amazon.com> wrote: > > nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it > > is more analogous to heap_blks_vacuumed. > > +1. > > > This will tell us which indexes are currently being vacuumed and the > > current progress of those operations, but it doesn't tell us which > > indexes have already been vacuumed or which ones are pending vacuum. > > VACUUM will process a table's indexes in pg_class OID order (outside > of parallel VACUUM, I suppose). See comments about sort order above > RelationGetIndexList(). Right. > > Anyway, it might be useful to add ordinal numbers to each index, that > line up with this processing/OID order. It would also be reasonable to > display the same number in log_autovacuum* (and VACUUM VERBOSE) > per-index output, to reinforce the idea. Note that we don't > necessarily display a distinct line for each distinct index in this > log output, which is why including the ordinal number there makes > sense. An alternative idea would be to show the number of indexes on the table and the number of indexes that have been processed in the leader's entry of pg_stat_progress_vacuum. Even in parallel vacuum cases, since we have index vacuum status for each index it would not be hard for the leader process to count how many indexes have been processed. Regarding the details of the progress of index vacuum, I'm not sure this progress information can fit for pg_stat_progress_vacuum. As Peter already mentioned, the behavior quite varies depending on index AM. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Sadhuprasad PatroДата:
Сообщение: Per-table storage parameters for TableAM/IndexAM extensions