Re: Add index scan progress to pg_stat_progress_vacuum

From: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2021-12-28 00:13:16
Message-ID: 7F1C93C1-C50E-497A-86C5-B637EA4D9F79@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I do agree that tracking progress by # of blocks scanned is not deterministic for all index types.

Based on this feedback, I went back to the drawing board on this.

Something like below may make more sense.

In pg_stat_progress_vacuum, introduce 2 new columns:

1. total_index_vacuum - total # of indexes to vacuum
2. max_cycle_time - the time in seconds of the longest index cycle.

Introduce another view called pg_stat_progress_vacuum_index_cycle:

postgres=# \d pg_stat_progress_vacuum_index_cycle
View "public.pg_stat_progress_vacuum_worker"
Column | Type | Collation | Nullable | Default
----------------+---------+-----------+----------+---------
pid | integer | | | <<<-- the PID of the vacuum worker ( or leader if it's doing index vacuuming )
leader_pid | bigint | | | <<<-- the leader PID to allow this view to be joined back to pg_stat_progress_vacuum
indrelid | bigint | | | <<<- the index relid of the index being vacuumed
ordinal_position | bigint | | | <<<- the processing position, which will give an idea of the processing position of the index being vacuumed.
dead_tuples_removed | bigint | | <<<- the number of dead rows removed in the current cycle for the index.

Having this information, one can

1. Determine which index is being vacuumed. For monitoring tools, this can help identify the index that accounts for most of the index vacuuming time.
2. Having the processing order of the current index will allow the user to determine how many of the total indexes has been completed in the current cycle.
3. dead_tuples_removed will show progress on the index vacuum in the current cycle.
4. the max_cycle_time will give an idea on how long the longest index cycle took for the current vacuum operation.

On 12/23/21, 2:46 AM, "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Tue, Dec 21, 2021 at 3:37 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> > nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it
> > is more analogous to heap_blks_vacuumed.
>
> +1.
>
> > This will tell us which indexes are currently being vacuumed and the
> > current progress of those operations, but it doesn't tell us which
> > indexes have already been vacuumed or which ones are pending vacuum.
>
> VACUUM will process a table's indexes in pg_class OID order (outside
> of parallel VACUUM, I suppose). See comments about sort order above
> RelationGetIndexList().

Right.

>
> Anyway, it might be useful to add ordinal numbers to each index, that
> line up with this processing/OID order. It would also be reasonable to
> display the same number in log_autovacuum* (and VACUUM VERBOSE)
> per-index output, to reinforce the idea. Note that we don't
> necessarily display a distinct line for each distinct index in this
> log output, which is why including the ordinal number there makes
> sense.

An alternative idea would be to show the number of indexes on the
table and the number of indexes that have been processed in the
leader's entry of pg_stat_progress_vacuum. Even in parallel vacuum
cases, since we have index vacuum status for each index it would not
be hard for the leader process to count how many indexes have been
processed.

Regarding the details of the progress of index vacuum, I'm not sure
this progress information can fit for pg_stat_progress_vacuum. As
Peter already mentioned, the behavior quite varies depending on index
AM.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-12-28 00:32:13 Re: Allow escape in application_name
Previous Message kuroda.hayato@fujitsu.com 2021-12-27 23:57:15 RE: Allow escape in application_name