Re: Add index scan progress to pg_stat_progress_vacuum

Поиск
Список
Период
Сортировка
От Imseih (AWS), Sami
Тема Re: Add index scan progress to pg_stat_progress_vacuum
Дата
Msg-id 04D9F80C-1A7C-4C23-9156-6799D974DC72@amazon.com
обсуждение исходный текст
Ответ на Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Список pgsql-hackers
Attached is the latest revision of the patch.

In "pg_stat_progress_vacuum", introduce 2 columns:

* total_index_vacuum : This is the # of indexes that will be vacuumed. Keep in mind that if failsafe mode kicks in
mid-flightto the vacuum, Postgres may choose to forgo index scans. This value will be adjusted accordingly.
 
* max_index_vacuum_cycle_time : The total elapsed time for a index vacuum cycle is calculated and this value will be
updatedto reflect the longest vacuum cycle. Until the first cycle completes, this value will be 0. The purpose of this
columnis to give the user an idea of how long an index vacuum cycle takes to complete.
 

postgres=# \d pg_stat_progress_vacuum
View "pg_catalog.pg_stat_progress_vacuum"
Column | Type | Collation | Nullable | Default
-----------------------------+---------+-----------+----------+---------
pid | integer | | |
datid | oid | | |
datname | name | | |
relid | oid | | |
phase | text | | |
heap_blks_total | bigint | | |
heap_blks_scanned | bigint | | |
heap_blks_vacuumed | bigint | | |
index_vacuum_count | bigint | | |
max_dead_tuples | bigint | | |
num_dead_tuples | bigint | | |
total_index_vacuum | bigint | | |
max_index_vacuum_cycle_time | bigint | | |



Introduce a new view called "pg_stat_progress_vacuum_index". This view will track the progress of a worker ( or leader
PID) while it's vacuuming an index. It will expose some key columns:
 

* pid: The PID of the worker process

* leader_pid: The PID of the leader process. This is the column that can be joined with "pg_stat_progress_vacuum".
leader_pidand pid can have the same value as a leader can also perform an index vacuum.
 

* indrelid: The relid of the index currently being vacuumed

* vacuum_cycle_ordinal_position: The processing position of the index being vacuumed. This can be useful to determine
howmany indexes out of the total indexes ( pg_stat_progress_vacuum.total_index_vacuum ) have been vacuumed
 

* index_tuples_vacuumed: This is the number of index tuples vacuumed for the index overall. This is useful to show that
thevacuum is actually doing work, as the # of tuples keeps increasing. 
 

postgres=# \d pg_stat_progress_vacuum_index
View "pg_catalog.pg_stat_progress_vacuum_index"
Column | Type | Collation | Nullable | Default
-------------------------------+---------+-----------+----------+---------
pid | integer | | |
leader_pid | bigint | | |
indrelid | bigint | | |
vacuum_cycle_ordinal_position | bigint | | |
index_tuples_vacuumed | bigint | | |








On 12/27/21, 6:12 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote:

    I do agree that tracking progress by # of blocks scanned is not deterministic for all index types.

    Based on this feedback, I went back to the drawing board on this. 

    Something like below may make more sense.

    In pg_stat_progress_vacuum, introduce 2 new columns:

    1. total_index_vacuum   - total # of indexes to vacuum
    2. max_cycle_time - the time in seconds of the longest index cycle. 

    Introduce another view called pg_stat_progress_vacuum_index_cycle:

    postgres=# \d pg_stat_progress_vacuum_index_cycle
           View "public.pg_stat_progress_vacuum_worker"
         Column     |  Type   | Collation | Nullable | Default
    ----------------+---------+-----------+----------+---------
    pid            | integer |           |          |                <<<-- the PID of the vacuum worker ( or leader if
it'sdoing index vacuuming )
 
    leader_pid     | bigint  |           |          |                <<<-- the leader PID to allow this view to be
joinedback to pg_stat_progress_vacuum
 
    indrelid       | bigint  |           |          |                <<<- the index relid of the index being vacuumed
    ordinal_position | bigint |           |          |                <<<- the processing position, which will give an
ideaof the processing position of the index being vacuumed. 
 
    dead_tuples_removed | bigint | |                <<<- the number of dead rows removed in the current cycle for the
index.

    Having this information, one can

    1. Determine which index is being vacuumed. For monitoring tools, this can help identify the index that accounts
formost of the index vacuuming time.
 
    2. Having the processing order of the current index will allow the user to determine how many of the total indexes
hasbeen completed in the current cycle.
 
    3. dead_tuples_removed will show progress on the index vacuum in the current cycle.
    4. the max_cycle_time will give an idea on how long the longest index cycle took for the current vacuum operation.


    On 12/23/21, 2:46 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:

        CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless
youcan confirm the sender and know the content is safe.
 



        On Tue, Dec 21, 2021 at 3:37 AM Peter Geoghegan <pg@bowt.ie> wrote:
        >
        > On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan <bossartn@amazon.com> wrote:
        > > nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed?  IMO it
        > > is more analogous to heap_blks_vacuumed.
        >
        > +1.
        >
        > > This will tell us which indexes are currently being vacuumed and the
        > > current progress of those operations, but it doesn't tell us which
        > > indexes have already been vacuumed or which ones are pending vacuum.
        >
        > VACUUM will process a table's indexes in pg_class OID order (outside
        > of parallel VACUUM, I suppose). See comments about sort order above
        > RelationGetIndexList().

        Right.

        >
        > Anyway, it might be useful to add ordinal numbers to each index, that
        > line up with this processing/OID order. It would also be reasonable to
        > display the same number in log_autovacuum* (and VACUUM VERBOSE)
        > per-index output, to reinforce the idea. Note that we don't
        > necessarily display a distinct line for each distinct index in this
        > log output, which is why including the ordinal number there makes
        > sense.

        An alternative idea would be to show the number of indexes on the
        table and the number of indexes that have been processed in the
        leader's entry of pg_stat_progress_vacuum. Even in parallel vacuum
        cases, since we have index vacuum status for each index it would not
        be hard for the leader process to count how many indexes have been
        processed.

        Regarding the details of the progress of index vacuum, I'm not sure
        this progress information can fit for pg_stat_progress_vacuum. As
        Peter already mentioned, the behavior quite varies depending on index
        AM.

        Regards,


        --
        Masahiko Sawada
        EDB:  https://www.enterprisedb.com/



Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: PublicationActions - use bit flags.
Следующее
От: Sadhuprasad Patro
Дата:
Сообщение: Per-table storage parameters for TableAM/IndexAM extensions