Re: Add index scan progress to pg_stat_progress_vacuum

From: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2022-01-11 20:33:16
Message-ID: 40222BEA-9C20-433F-8476-58179F3547A7@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/11/22, 1:01 PM, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote:

On 1/10/22, 5:01 PM, "Imseih (AWS), Sami" <simseih(at)amazon(dot)com> wrote:
> I have attached the 3rd revision of the patch which also includes the documentation changes. Also attached is a rendered html of the docs for review.
>
> "max_index_vacuum_cycle_time" has been removed.
> "index_rows_vacuumed" renamed to "index_tuples_removed". "tuples" is a more consistent with the terminology used.
> "vacuum_cycle_ordinal_position" renamed to "index_ordinal_position".

Thanks for the new version of the patch!

nitpick: I get one whitespace error when applying the patch.

Applying: Expose progress for the "vacuuming indexes" phase of a VACUUM operation.
.git/rebase-apply/patch:44: tab in indent.
Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, index
warning: 1 line adds whitespace errors.

That was missed. Will fix it.

+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>num_indexes_to_vacuum</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The number of indexes that will be vacuumed. Only indexes with
+ <literal>pg_index.indisready</literal> set to "true" will be vacuumed.
+ Whenever <xref linkend="guc-vacuum-failsafe-age"/> is triggered, index
+ vacuuming will be bypassed.
+ </para></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>

We may want to avoid exhaustively listing the cases when this value
will be zero. I would suggest saying, "When index cleanup is skipped,
this value will be zero" instead.

What about something like "The number of indexes that are eligible for vacuuming".
This covers the cases where either an individual index is skipped or the entire "index vacuuming" phase is skipped.

+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>relid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the table being vacuumed.
+ </para></entry>
+ </row>

Do we need to include this field? I would expect indexrelid to go
here.

Having indexrelid and relid makes the pg_stat_progress_vacuum_index view "self-contained". A user can lookup the index and table being vacuumed without joining back to pg_stat_progress_vacuum.

+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>leader_pid</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Process ID of the parallel group leader. This field is <literal>NULL</literal>
+ if this process is a parallel group leader or the
+ <literal>vacuuming indexes</literal> phase is not performed in parallel.
+ </para></entry>
+ </row>

Are there cases where the parallel group leader will have an entry in
this view when parallelism is enabled?

Yes. A parallel group leader can perform an index vacuum just like a parallel worker. If you do something like "vacuum (parallel 3) ", you may have up to 4 processes vacuuming indexes. The leader + 3 workers.

+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>index_ordinal_position</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The order in which the index is being vacuumed. Indexes are vacuumed by OID in ascending order.
+ </para></entry>
+ </row>

Should we include the bit about the OID ordering? I suppose that is
unlikely to change in the near future, but I don't know if it is
relevant information. Also, do we need to include the "index_"
prefix? This view is specific for indexes. (I have the same question
for index_tuples_removed.)

I was on the fence about both of these as well. Will make a change to this.

Should this new table go after the "VACUUM phases" table? It might
make sense to keep the phases table closer to where it is referenced.

I did not think that would read better. The introduction discusses both views and the "phase" table is linked from the pg_stat_progress_vacuum

+ /* Advertise the number of indexes to vacuum if we are not in failsafe mode */
+ if (!lazy_check_wraparound_failsafe(vacrel))
+ pgstat_progress_update_param(PROGRESS_VACUUM_TOTAL_INDEX_VACUUM, vacrel->nindexes);

Shouldn't this be 0 when INDEX_CLEANUP is off, too?

This view is only covering the "vacuum index" phase, but it should also cover index_cleanup phase as well. Will update the patch.

+#define PROGRESS_VACUUM_CURRENT_INDRELID 7
+#define PROGRESS_VACUUM_LEADER_PID 8
+#define PROGRESS_VACUUM_INDEX_ORDINAL 9
+#define PROGRESS_VACUUM_TOTAL_INDEX_VACUUM 10
+#define PROGRESS_VACUUM_DEAD_TUPLES_VACUUMED 11

nitpick: I would suggest the following names to match the existing
style:

PROGRESS_VACUUM_NUM_INDEXES_TO_VACUUM
PROGRESS_VACUUM_INDEX_LEADER_PID
PROGRESS_VACUUM_INDEX_INDEXRELID
PROGRESS_VACUUM_INDEX_ORDINAL_POSITION
PROGRESS_VACUUM_INDEX_TUPLES_REMOVED

That looks better.

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2022-01-11 20:34:26 Re: Add jsonlog log_destination for JSON server logs
Previous Message Heikki Linnakangas 2022-01-11 20:17:18 Re: Boyer-More-Horspool searching LIKE queries