Quick Links

Re: POC: Parallel processing of indexes in autovacuum

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Daniil Davydov <3danissimo(at)gmail(dot)com>
Cc:	Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: POC: Parallel processing of indexes in autovacuum
Date:	2026-01-15 02:13:16
Message-ID:	CAD21AoBT1LWqPZkcHpVMVh0ZOXUneO=p61t0i8cQ+kOP9qfODQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jan 7, 2026 at 1:51 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Jan 6, 2026 at 3:44 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 6, 2026 at 1:51 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Are you still working on it? Or shall I draft this part on top of the
> > > 0001 patch?
> >
> > I thought about some "beautiful" approach, but for now I have
> > only one idea - force parallel a/v workers to get values for these
> > parameters from shmem (which obviously can be modified by the
> > leader a/v process). I'll post this patch in the near future.
> >
>
> I am posting a draft version of the patch (see 0005) that allows parallel
> leader to propagate changes of cost-based parameters to its parallel
> workers. It is a very rough fix, but it reflects my idea that is to have some
> shared state from which parallel workers can get values for the parameters
> (and which only leader worker can modify, obviously).
>
> I have also added a test that shows that this idea is working - the test
> ensures that parallel workers can change its parameters if they have been
> changed for the leader worker.
>
> Note that so far the work is in progress - this logic works only for
> vacuum_cost_delay and vacuum_cost_limits parameters. I think that we
> should agree on an idea first, and only then apply logic for all appropriate
> parameters.
>
> What do you think?

Thank you for updating the patches! Here are review comments.

* 0001 patch

+static void
+autovacuum_worker_before_shmem_exit(int code, Datum arg)
+{
+ if (code != 0)
+ AutoVacuumReleaseAllParallelWorkers();
+
+ Assert(av_nworkers_reserved == 0);
+}

While adding the assertion here makes sense, the assertion won't work
in non-assertion builds. I guess it's safer to call
AutoVacuumReleaseAllParallelWorkers() regardless of the code to ensure
that no autovacuum workers exit while holding parallel workers.

---
+ before_shmem_exit(autovacuum_worker_before_shmem_exit, 0);

I think it would be better to set this callback later like before the
main loop of processing the tables as it makes no sense even if we set
it very early.

---
+ /*
+ * Cap the number of free workers by new parameter's value, if needed.
+ */
+ AutoVacuumShmem->av_freeParallelWorkers =
+ Min(AutoVacuumShmem->av_freeParallelWorkers,
+ autovacuum_max_parallel_workers);
+
+ if (autovacuum_max_parallel_workers > prev_max_parallel_workers)
+ {
+ /*
+ * If user wants to increase number of parallel autovacuum workers, we
+ * must increase number of free workers.
+ */
+ AutoVacuumShmem->av_freeParallelWorkers +=
+ (autovacuum_max_parallel_workers - prev_max_parallel_workers);
+ }

Suppose the previous autovacuum_max_parallel_workers is 5 and there
are 2 workers are reserved (i.e., there are 3 free parallel workers),
if the autovacuum_max_parallel_workers changes to 2, the new
AutoVacuumShmem->av_freeParallelWorkers would be 2 based on the above
codes, but I believe that the new number of free workers should be 0
as there are already 2 workers are running. What do you think? I guess
we can calculate the new number of free workers by:

Max((autovacuum_max_parallel_workers - prev_max_parallel_workers) +
AutoVacuumShmem->av_freeParallelWorkers), 0)

---
I've attached a patch proposing some minor changes.

* 0002 patch

+ /*
+ * Number of planned and actually launched parallel workers for all index
+ * scans, or NULL
+ */
+ PVWorkersUsage *workers_usage;

I think that LVRelState can have PVWorkersUsage instead of a pointer to it.

---
+ /*
+ * Allocate space for workers usage statistics. Thus, we explicitly
+ * make clear that such statistics must be accumulated. For now, this
+ * is used only by autovacuum leader worker, because it must log it in
+ * the end of table processing.
+ */
+ vacrel->workers_usage = AmAutoVacuumWorkerProcess() ?
+ (PVWorkersUsage *) palloc0(sizeof(PVWorkersUsage)) :
+ NULL;

I think we can report the worker statistics even in VACUUM VERBOSE
logs. Currently VACUUM VERBOSE reports the worker usage just during
index vacuuming but it would make sense to report the overall
statistics in vacuum logs. It would help make VACUUM VERBOSE logs and
autovacuum logs consistent.

But we don't need to report the worker usage if we didn't use the
parallel vacuum (i.e., if npanned == 0).

---
+ /* Remember these values, if we asked to. */
+ if (wusage != NULL)
+ {
+ wusage->nlaunched += pvs->pcxt->nworkers_launched;
+ wusage->nplanned += nworkers;
+ }

This code runs after the attempt to reserve parallel workers.
Consequently, if we fail to reserve any workers due to
autovacuum_max_parallel_workers, we report the status as if parallel
vacuum wasn't planned at all. I think knowing the number of workers
that were planned but not reserved would provide valuable insight for
users tuning autovacuum_max_parallel_workers.

---
+ if (vacrel->workers_usage)
+ appendStringInfo(&buf,
+ _("parallel index vacuum/cleanup :
workers planned = %d, workers launched = %d\n"),
+ vacrel->workers_usage->nplanned,
+ vacrel->workers_usage->nlaunched);

Since these numbers are the total number of workers planned and
launched, how about changing it to something "parallel index
vacuum/cleanup: %d workers were planned and %d workers were launched
in total"?

* 0003 patch

+typedef enum AVLeaderFaulureType
+{
+ FAIL_NONE,
+ FAIL_ERROR,
+ FAIL_FATAL,
+} AVLeaderFaulureType;

I'm concerned that it is somewhat overwrapped with what injection
points does as we can set 'error' to injection_points_attach(). For
the FATAL error, we can terminate the autovacuum worker by using
pg_terminate_backend() that keeps waiting due to
injection_point_attach() with action='wait'.

---
+ /*
+ * Injection point to help exercising number of available parallel
+ * autovacuum workers.
+ */
+ INJECTION_POINT("autovacuum-set-free-parallel-workers-num",
+ &AutoVacuumShmem->av_freeParallelWorkers);

This injection point is added to two places. IIUC the purpose of this
function is to update the free_parallel_workers of InjPointState. And
that value is taken by get_parallel_autovacuum_free_workers() SQL
function during the TAP test. I guess it's better to have
get_parallel_autovacuum_free_workers() function to direcly check
av_freeParallelWorkers with a proper locking.

---
It would be great if we could test the av_freeParallelWorkers
adjustment when max_parallel_maintenance_workers changes.

* 0005 patch

+typedef struct PVSharedCostParams
+{
+ slock_t spinlock; /* protects all fields below */
+
+ /* Copies of corresponding parameters from autovacuum leader process */
+ double cost_delay;
+ int cost_limit;
+} PVSharedCostParams;

Since Parallel workers don't reload the config file I think other
vacuum delay related parameters such as VacuumCostPage{Miss|Hit|Dirty}
also needs to be shared by the leader.

---
+ if (!AmAutoVacuumWorkerProcess())
+ {
+ /*
+ * If we are autovacuum parallel worker, check whether cost-based
+ * parameters had changed in leader worker.
+ * If so, vacuum_cost_delay and vacuum_cost_limit will be set to the
+ * values which leader worker is operating on.
+ *
+ * Do it before checking VacuumCostActive, because its value might be
+ * changed after leader's parameters consumption.
+ */
+ parallel_vacuum_fix_cost_based_params();
+ }

We need to add checks to prevent the normal backend running the VACUUM
command from calling parallel_vacuum_fix_cost_based_params().

IIUC autovacuum parallel workers would call
parallel_vacuum_fix_cost_based_params() and update their
vacuum_cost_{delay|limit} every vacuum_delay_point().

---
+/*
+ * Function to be called from parallel autovacuum worker in order to sync
+ * some cost-based delay parameter with the leader worker.
+ */
+bool
+parallel_vacuum_fix_cost_based_params(void)
+{

The 'fix' doesn't sound right to me as it's not broken actually. How
about something like parallel_vacuum_update_shared_delay_params?

+ Assert(IsParallelWorker() && !AmAutoVacuumWorkerProcess());
+
+ SpinLockAcquire(&pv_shared_cost_params->spinlock);
+
+ vacuum_cost_delay = pv_shared_cost_params->cost_delay;
+ vacuum_cost_limit = pv_shared_cost_params->cost_limit;
+
+ SpinLockRelease(&pv_shared_cost_params->spinlock);

IIUC autovacuum parallel workers seems to update their
vacuum_cost_{delay|limit} every vacuum_delay_point(), which seems not
good. Can we somehow avoid unnecessary updates?

---
+
+ if (vacuum_cost_delay > 0 && !VacuumFailsafeActive)
+ VacuumCostActive = true;
+

Should we consider the case of disabling VacuumCostActive as well?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
0001_masahiko.patch	application/octet-stream	4.4 KB

In response to

Re: POC: Parallel processing of indexes in autovacuum at 2026-01-07 09:51:06 from Daniil Davydov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joel Jacobson	2026-01-15 02:14:09	Re: Optimize LISTEN/NOTIFY
Previous Message	lchch1990	2026-01-15 02:11:21	回复：Re: Can we change pg_rewind used without wal_log_hints and data_checksums