Re: POC: Parallel processing of indexes in autovacuum

From: Daniil Davydov <3danissimo(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: 2026-03-28 11:10:44
Message-ID: CAJDiXgj=-R1z7H7+npm-o+q6YBkr5_6Qe=1wcy47ovAqej4TkA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Mar 26, 2026 at 5:43 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Mar 25, 2026 at 12:45 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
> >
> > Searching for arguments in
> > favor of opt-in style, I asked for help from another person who has been
> > managing the setup of highload systems for decades. He promised to share his
> > opinion next week.
>
> Given that we have one and half weeks before the feature freeze, I
> think it's better to complete the project first before waiting for
> his/her comments next week. Even if we finish this feature with the
> opt-out style, we can hear more opinions on it and change the default
> behavior as the change would be privial. What do you think?
>

Sure, if we can change the default value after the feature freeze, I don't
mind leaving our parameter in opt-out style by now.

> I've squashed all patches except for the documentation patch as I
> assume you're working on it. The attached fixup patch contains several
> changes: using opt-out style, comment improvements, and fixing typos
> etc.
>

Thank you very much for the proposed fixes!
I like the way you have changed nparallel_workers calculation (autovacuum.c).
Forcing parallel workers to always read shared cost params at the first time
is a good decision. All comments changes are also LGTM.

The only place that I have changed is reloptions.c :
As you have explained, it is not appropriate to use the "overrides" wording
in the reloption's description, so I decided to return an old one.

On Fri, Mar 27, 2026 at 10:54 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Wed, Mar 25, 2026 at 3:43 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Given that we have one and half weeks before the feature freeze, I
> > think it's better to complete the project first before waiting for
> > his/her comments next week. Even if we finish this feature with the
> > opt-out style, we can hear more opinions on it and change the default
> > behavior as the change would be privial. What do you think?
> >
> > I've squashed all patches except for the documentation patch as I
> > assume you're working on it. The attached fixup patch contains several
> > changes: using opt-out style, comment improvements, and fixing typos
> > etc.
>
> +1 for enabling this feature by default. When enough CPU is available,
> vacuuming multiple indexes of a table in parallel in autovacuum
> definitely speeds things up.

Yes, for sure. But I have concerns that enabling parallel a/v for everyone
will cause the parallel workers shortage during processing of the most huge
tables.

> Thank you for sending the latest patches. I quickly reviewed the v31
> patches. Here are some comments.
>
> 1/ + {"autovacuum_parallel_workers", RELOPT_TYPE_INT,
>
> I haven't looked at the whole thread, but do we all think we need this
> as a relopt? IMHO, we can wait for field experience and introduce this
> later.

I think that we should leave both reloption and the config parameter.
Getting rid from the reloption will greatly reduce the ability of users to
tune this feature. I'm afraid that this may lead to people not using parallel
autovacuum.

> I'm having a hard time finding a use-case where one wants to
> disable the indexes at the table level. If there was already an
> agreement, I agree to commit to that decision.

You can read discussion from [1] to the current message in order to dive into
the question.

To make the long story short, I think that the most common use case for this
feature is allowing parallel a/v for 2-3 tables, each of which has ~100
indexes. The rest of the tables do not require parallel processing (at least
it's a much lower priority for them).

At the same time, Masahiko-san thinks that only the system should decide which
tables will be processed in parallel. System's decision should be based on the
number of indexes and a few other config parameters (e.g.
min_parallel_index_scan_size). Thus, possibly many tables will be able to be
processed in parallel.

(Both opinions are pretty simplified).

>
> 2/ + /*
> + * If 'true' then we are running parallel autovacuum. Otherwise, we are
> + * running parallel maintenence VACUUM.
> + */
> + bool is_autovacuum;
> +
>
> The variable name looks a bit confusing. How about we rely on
> AmAutoVacuumWorkerProcess() and avoid the bool in shared memory?

This variable is needed for parallel workers, which are taken from the
bgworkers pool. I.e. AmAutovacuumWorker() will return 'false' for them.
We need the "is_autovacuum" variable in order to understand exactly what this
process was started for (VACUUM PARALLEL or parallel autovacuum).

Thanks everyone for the review!
Please, see an updated set of patches :
As I promised, I created a dedicated chapter for Parallel Vacuum description.
Both maintenance VACUUM and autovacuum now refer to this chapter.

I am pretty inexperienced in the documentation writing, so forgive me if
something is out of code style.

[1] https://www.postgresql.org/message-id/CAJDiXggH1bW%3D4n%2B55CGLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw%40mail.gmail.com

--
Best regards,
Daniil Davydov

Attachment Content-Type Size
v32-0002-Documantation-for-parallel-autovacuum.patch text/x-patch 8.6 KB
v32-0001-Parallel-autovacuum.patch text/x-patch 32.0 KB
v31--v32-dif-for-0001.patch text/x-patch 13.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2026-03-28 11:26:07 Re: Patch: dumping tables data in multiple chunks in pg_dump
Previous Message Hannu Krosing 2026-03-28 10:59:18 Re: Patch: dumping tables data in multiple chunks in pg_dump