Re: Report index currently being vacuumed in pg_stat_progress_vacuum

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report index currently being vacuumed in pg_stat_progress_vacuum
Date: 2026-06-29 15:31:12
Message-ID: CALj2ACU3+seeZKAsH=d_LpVEVRfafnvqDmj3czFf1daOTbXVyw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, May 5, 2026 at 10:54 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> I think it is valuable to show the index being processed. There is
> really no other easy way to get this information except for pstack,
> etc. I am +1 for the idea.

Thanks for reviewing this!

> However, I am not sure that having a separate row for every parallel
> worker is the right approach. The pg_stat_progress_* views are designed
> to show progress per row. Each row represents one command with
> meaningful progress counters (heap_blks_scanned, indexes_total,
> indexes_processed, etc.). A parallel worker row would only show
> current_index_relid and leader_pid with no actual progress information
> of its own. That is status, not progress, and it does not fit the
> view. Also, many columns would remain empty or redundant with the
> leader's row.
>
> Instead, could we aggregate the parallel worker information into the
> leader's row. For example, an array of worker PIDs in one column and an
> array of index relids in another?

Thanks for the review. I read f1889729 and it looks like the
preference was to keep one command = one row, with workers feeding the
leader's row rather than showing up as separate rows. I want to stay
with that approach.

I considered having the leader set up a DSA that workers write into,
shown as an extra column on the leader's row. But that means new
shared memory whose handle has to be stored somewhere other backends
can find it, attached by the reader, and freed safely while a
monitoring query might still be reading it - that's a lot of work for
a small amount of per-worker data.

The simpler approach is to have workers report the index they're
currently on into their own st_progress_param[] slots, which already
exist per backend, along with their leader's pid. The view then groups
the worker entries under the matching leader's pid and shows only the
leader rows, so no new shared memory is needed. The current index oid
column has to be added to that array anyway for the non-parallel case,
which needs it just as much, and the parallel workers already have
their own slots to report into, so there's nothing extra to set up.
One thing to note is that pg_stat_get_progress_info('VACUUM') itself
would still return the worker rows, and the grouping happens in the
pg_stat_progress_vacuum view instead. I prefer keeping it in the view
rather than the function. The function is shared by all the progress
commands and only deals with raw params, while the leader pid grouping
is specific to VACUUM. This way the shared function stays unchanged
and the other progress views are not affected.

While I'm here, in the "vacuuming indexes" phase I also want to report
the total index pages to scan and the pages scanned so far, for the
index currently being vacuumed. On a large index this phase can run
for a long time with no way to tell whether it's making progress.

Does this direction sound reasonable, or do you see a reason to prefer
a different approach?

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2026-06-29 16:01:14 Re: Fix publisher-side sequence permission reporting
Previous Message Sami Imseih 2026-06-29 15:31:03 Re: Report oldest xmin source when autovacuum cannot remove tuples