Re: POC: Parallel processing of indexes in autovacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Daniil Davydov <3danissimo(at)gmail(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: 2026-03-31 21:19:41
Message-ID: CAD21AoAvZc6Rwi1hZ7x+U3vz7AMMSpcbQ2JBn6+WmQp-3yfKMg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 31, 2026 at 7:18 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Mar 31, 2026 at 2:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've made some changes to the documentation part, merged two patches
> > into one, and updated the commit message. Please review the attached
> > patch.
> >
>
> Great, thank you very much!
>
> Again, I don't know how to write the documentation well, so you can ignore
> my comments :
>
> > + <command>VACUUM</command> can perform index vacuuming and index cleanup
> Don't we need to mention autovacuum here too? I thought that VACUUM in the
> context means "manual VACUUM command".

I think that the documentation explains that the autovacuum daemon is
a worker automatically executing VACUUM and ANALYZE commands.

>
> > + ...applies specifically to the index vacuuming and index cleanup phases...
> Maybe we can refer to "vacuum-phases" here?

Agreed.

>
> All other changes look good to me.
>
> !!!
> > Searching for arguments in
> > favor of opt-in style, I asked for help from another person who has been
> > managing the setup of highload systems for decades. He promised to share his
> > opinion next week.
>
> I talked to Anton Doroshkevich today.

Thank you for sharing!

> He confirmed that as a rule there are *hundreds of thousands* of tables in the
> system, the vast majority of which do not need to be vacuumed in parallel mode.

I'm still struggling to see the technical justification; why would a
user want to avoid parallel vacuuming on eligible tables if they have
already explicitly allowed the system to use more resources by setting
autovacuum_max_parallel_workers to >0? If resource contention occurs,
it is typically a sign that the global parameters need re-tuning. As I
mentioned, the same contention can occur even with an opt-in style if
multiple tables are manually configured.

Also, I'm concerned that opt-in style could confuse users since
parallel vacuum is enabled by default in VACUUM command.

> He also suggested the following : let the reloption overlap the value of the
> GUC parameter. I.e. even if av_max_parallel_workers parameters is 0 the user
> still can set the av_parallel_workers to 10 for some table, and autovacuum
> will process this table in parallel.
>
> I remember that you want to use the GUC parameter as a global switch, and this
> approach will break this logic. But according to Anton's words, it is okay if
> the GUC parameter cannot disable parallel a/v for all tables instantly. It will
> become an administrator's responsibility to manually turn off parallel a/v for
> several tables (again, it is completely OK). Thus, this feature can be handy
> for all use cases.

While some autovacuum parameters do override GUCs, those are typically
local to the process (like cost delay). Parallel workers, however, are
a shared system-wide resource. In a multi-tenant environment, allowing
a single table's reloption to bypass the
autovacuum_max_parallel_workers = 0 limit could lead to unexpected
exhaustion of the worker pool. I think that this GUC should act as a
reliable global switch for resource management.

> I hope it doesn't look like as an adapting to the needs of a specific user.
> A lot of super-large productions are migrating to postgres now, and I believe
> that we should ensure their comfort too.

I'm not prioritizing one specific use case over another. I believe
that there are also users who want to use parallel vacuum on hundreds
of thousands of tables. We should consider a better solution while
checking it from multiple perspectives such as the usability, the
robustness and consistency with the existing features and behaviors
etc.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2026-03-31 21:25:44 Re: Shared hash table allocations
Previous Message Melanie Plageman 2026-03-31 20:59:14 Re: AIO / read stream heuristics adjustments for index prefetching