Re: Add index scan progress to pg_stat_progress_vacuum

From: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2022-01-13 03:52:46
Message-ID: 7A4B3BA8-0768-4463-9B25-25DBECF56B25@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/12/22, 1:28 PM, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote:

On 1/11/22, 11:46 PM, "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Regarding the new pg_stat_progress_vacuum_index view, why do we need
> to have a separate view? Users will have to check two views. If this
> view is expected to be used together with and joined to
> pg_stat_progress_vacuum, why don't we provide one view that has full
> information from the beginning? Especially, I think it's not useful
> that the total number of indexes to vacuum (num_indexes_to_vacuum
> column) and the current number of indexes that have been vacuumed
> (index_ordinal_position column) are shown in separate views.

> I suppose we could add all of the new columns to
> pg_stat_progress_vacuum and just set columns to NULL as appropriate.
> But is that really better than having a separate view?

To add, since a vacuum can utilize parallel worker processes + the main vacuum process to perform index vacuuming, it made sense to separate the backends doing index vacuum/cleanup in a separate view.
Besides what Nathan suggested, the only other clean option I can think of is to perhaps create a json column in pg_stat_progress_vacuum which will include all the new fields. My concern with this approach is that it will make usability, to flatten the json, difficult for users.

> Also, I’m not sure how useful index_tuples_removed is; what can we
> infer from this value (without a total number)?

> I think the idea was that you can compare it against max_dead_tuples
> and num_dead_tuples to get an estimate of the current cycle progress.
> Otherwise, it just shows that progress is being made.

The main purpose is to really show that the "index vacuum" phase is actually making progress. Note that for certain types of indexes, i.e. GIN/GIST the number of tuples_removed will end up exceeding the number of num_dead_tuples.

Nathan

[0] https://postgr.es/m/7874FB21-FAA5-49BD-8386-2866552656C7%40amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-01-13 04:03:14 Re: Windows vs recovery tests
Previous Message Fujii Masao 2022-01-13 03:38:12 Re: [PATCH]Add tab completion for foreigh table