Re: Add index scan progress to pg_stat_progress_vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2022-12-14 01:43:10
Message-ID: CAD21AoAjSYmTWgWvuJhDa7Dwi0wZLJBh+1irybjCLRVh6jvJKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 13, 2022 at 1:40 PM Imseih (AWS), Sami <simseih(at)amazon(dot)com> wrote:
>
> Thanks for the feedback. I agree with the feedback, except
> for
>
> > need to have ParallelVacuumProgress. I see
> > parallel_vacuum_update_progress() uses this value but I think it's
> > better to pass ParallelVacuumState to via IndexVacuumInfo.
>
> I was trying to avoid passing a pointer to
> ParallelVacuumState in IndexVacuuminfo.
>
> ParallelVacuumProgress is implemented in the same
> way as VacuumSharedCostBalance and
> VacuumActiveNWorkers. See vacuum.h
>
> These values are reset at the start of a parallel vacuum cycle
> and reset at the end of an index vacuum cycle.
>
> This seems like a better approach and less invasive.
> What would be a reason not to go with this approach?

First of all, I don't think we need to declare ParallelVacuumProgress
in vacuum.c since it's set and used only in vacuumparallel.c. But I
don't even think it's a good idea to declare it in vacuumparallel.c as
a static variable. The primary reason is that it adds things we need
to care about. For example, what if we raise an error during index
vacuum? The transaction aborts but ParallelVacuumProgress still refers
to something old. Suppose further that the next parallel vacuum
doesn't launch any workers, the leader process would still end up
accessing the old value pointed by ParallelVacuumProgress, which
causes a SEGV. So we need to reset it anyway at the beginning of the
parallel vacuum. It's easy to fix at this time but once the parallel
vacuum code gets more complex, it could forget to care about it.

IMO VacuumSharedCostBalance and VacuumActiveNWorkers have a different
story. They are set in vacuumparallel.c and are used in vacuum.c for
vacuum delay. If they weren't global variables, we would need to pass
them to every function that could eventually call the vacuum delay
function. So it makes sense to me to have them as global variables.On
the other hand, for ParallelVacuumProgress, it's a common pattern that
ambulkdelete(), amvacuumcleanup() or a common index scan routine like
btvacuumscan() checks the progress. I don't think index AM needs to
pass the value down to many of its functions. So it makes sense to me
to pass it via IndexVacuumInfo.

Having said that, I'd like to hear opinions also from other hackers, I
might be wrong and it's more invasive as you pointed out.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2022-12-14 02:29:56 Avoid extra "skipping" messages from VACUUM/ANALYZE
Previous Message Kyotaro Horiguchi 2022-12-14 01:35:02 Re: Time delayed LR (WAS Re: logical replication restrictions)