Re: Add index scan progress to pg_stat_progress_vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2022-03-09 01:53:44
Message-ID: CAD21AoBW6SMJ96CNoMeu+f_BR4jmatPcfVA016FdD2hkLDsaTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 9, 2022 at 12:41 AM Imseih (AWS), Sami <simseih(at)amazon(dot)com> wrote:
>
> + +/*
> + + * vacuum_worker_init --- initialize this module's shared memory hash
> + + * to track the progress of a vacuum worker
> + + */
> + +void
> + +vacuum_worker_init(void)
> + +{
> + + HASHCTL info;
> + + long max_table_size = GetMaxBackends();
> + +
> + + VacuumWorkerProgressHash = NULL;
> + +
> + + info.keysize = sizeof(pid_t);
> + + info.entrysize = sizeof(VacProgressEntry);
> + +
> + + VacuumWorkerProgressHash = ShmemInitHash("Vacuum Progress Hash",
> + +
> + max_table_size,
> + +
> + max_table_size,
> + +
> + &info,
> + +
> + HASH_ELEM | HASH_BLOBS);
> + +}
>
> + It seems to me that creating a shmem hash with max_table_size entries
> + for parallel vacuum process tracking is too much. IIRC an old patch
> + had parallel vacuum workers advertise its progress and changed the
> + pg_stat_progress_vacuum view so that it aggregates the results
> + including workers' stats. I think it’s better than the current one.
> + Why did you change that?
>
> + Regards,
>
> I was trying to avoid a shared memory to track completed indexes, but aggregating stats does not work with parallel vacuums. This is because a parallel worker will exit before the vacuum completes causing the aggregated total to be wrong.
>
> For example
>
> Leader_pid advertises it completed 2 indexes
> Parallel worker advertises it completed 2 indexes
>
> When aggregating we see 4 indexes completed.
>
> After the parallel worker exits, the aggregation will show only 2 indexes completed.

Indeed.

It might have already been discussed but other than using a new shmem
hash for parallel vacuum, I wonder if we can allow workers to change
the leader’s progress information. It would break the assumption that
the backend status entry is modified by its own backend, though. But
it might help for progress updates of other parallel operations too.
This essentially does the same thing as what the current patch does
but it doesn't require a new shmem hash.

Another idea I come up with is that the parallel vacuum leader checks
PVIndStats.status and updates how many indexes are processed to its
progress information. The leader can check it and update the progress
information before and after index vacuuming. And possibly we can add
a callback to the main loop of index AM's bulkdelete and vacuumcleanup
so that the leader can periodically make it up-to-date.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-03-09 02:01:23 Re: Allow async standbys wait for sync replication
Previous Message Andres Freund 2022-03-09 01:50:14 Re: Naming of the different stats systems / "stats collector"