Re: POC: Parallel processing of indexes in autovacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Daniil Davydov <3danissimo(at)gmail(dot)com>
Cc: Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: 2026-01-05 18:51:11
Message-ID: CAD21AoB7v5tLPXLK=qmtt6PaEC1f+Fb-gh+MwAbXfm6x4eZGNw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 23, 2025 at 7:02 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Sun, Nov 23, 2025 at 5:51 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
> >
> > > > nworkers has a double meaning. The return value of
> > > > AutoVacuumReserveParallelWorkers
> > > > is nreserved. I think this should be
> > > >
> > > > ```
> > > > nreserved = AutoVacuumReserveParallelWorkers(nworkers);
> > > > ```
> > > >
> > > > and nreserved becomes the authoritative value for the number of parallel
> > > > workers after that point.
> >
> > I could not find this pattern being used in the code base.
> > I think it will be better and more in-line without what we generally do
> > and pass-by-reference and update the value inside
> > AutoVacuumReserveParallelWorkers:
> >
> > ```
> > AutoVacuumReserveParallelWorkers(&nworkers).
> > ```
>
> Maybe I just don't like functions with side effects, but this function will
> have ones anyway. I'll add logic with passing by reference as you
> suggested.
>
> >
> > >> ---
> > >> { name => 'autovacuum_max_parallel_workers', type => 'int', context =>
> > >> 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
> > >> short_desc => 'Maximum number of parallel autovacuum workers, that
> > >> can be taken from bgworkers pool.',
> > >> long_desc => 'This parameter is capped by "max_worker_processes"
> > >> (not by "autovacuum_max_workers"!).',
> > >> variable => 'autovacuum_max_parallel_workers',
> > >> boot_val => '0',
> > >> min => '0',
> > >> max => 'MAX_BACKENDS',
> > >> },
> > >>
> > >> Parallel vacuum in autovacuum can be used only when users set the
> > >> autovacuum_parallel_workers storage parameter. How about using the
> > >> default value 2 for autovacuum_max_parallel_workers GUC parameter?
> >
> > > Sounds reasonable, +1 for it.
> >
> > v15-0004 has an incorrect default value for `autovacuum_max_parallel_workers`.
> > It should now be 2.
> >
> > + Sets the maximum number of parallel autovacuum workers that
> > + can be used for parallel index vacuuming at one time. Is capped by
> > + <xref linkend="guc-max-worker-processes"/>. The default is 0,
> > + which means no parallel index vacuuming.
>
> Thanks for noticing it! Fixed.
>
> I am sending an updated set of patches.

Thank you for updating the patch! I've reviewed the 0001 patch and
here are some comments:

---
+ /* Remember how many workers we have reserved. */
+ av_nworkers_reserved += *nworkers;

I think we can simply assign *nworkers to av_nworkers_reserved instead
of incrementing it as we're sure that av_nworkers_reserved is 0 at the
beginning of this function.

---
+static void
+AutoVacuumReleaseAllParallelWorkers(void)
+{
+ /* Only leader worker can call this function. */
+ Assert(AmAutoVacuumWorkerProcess() && !IsParallelWorker());
+
+ if (av_nworkers_reserved > 0)
+ AutoVacuumReleaseParallelWorkers(av_nworkers_reserved);
+}

We can put an assertion at the end of the function to verify that this
worker doesn't reserve any worker.

---
+static void
+autovacuum_worker_before_shmem_exit(int code, Datum arg)
+{
+ if (code != 0)
+ AutoVacuumReleaseAllParallelWorkers();
+}

I think it would be more future-proof if we call
AutoVacuumReleaseAllParallelWorkers() regardless of the code if there
is no strong reason why we check the code there.

---
+ before_shmem_exit(autovacuum_worker_before_shmem_exit, 0);

/*
* Create a per-backend PGPROC struct in shared memory. We must do this
* before we can use LWLocks or access any shared memory.
*/
InitProcess();

I think it's better to register the
autovacuum_worker_before_shmem_exit() after the process
initialization. The function could use LWLocks to release the reserved
workers. Given that AutoVacuumReleaseAllParallelWorkers() doesn't try
to release the reserved worker when av_nworkers_reserved == 0, but it
would be more future-proof to do that after the basic process
initialization processes.

How about renaming autovacuum_worker_before_shmem_exit() to
autovacuum_worker_onexit()?

---
IIUC the patch needs to implement some logic to propagate the updates
of vacuum delay parameters to parallel vacuum workers. Are you still
working on it? Or shall I draft this part on top of the 0001 patch?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2026-01-05 18:53:06 Re: Custom oauth validator options
Previous Message Masahiko Sawada 2026-01-05 18:47:01 Re: Newly created replication slot may be invalidated by checkpoint