Re: POC: Parallel processing of indexes in autovacuum

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: Daniil Davydov <3danissimo(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: 2026-03-30 00:17:21
Message-ID: CAHg+QDehxaJEd1Yp1MpW8UO71xmbasy7t2GZGvqOYwkr0md8DQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

On Sat, Mar 28, 2026 at 4:11 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:

> Hi,
>
> On Thu, Mar 26, 2026 at 5:43 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> >
> > On Wed, Mar 25, 2026 at 12:45 AM Daniil Davydov <3danissimo(at)gmail(dot)com>
> wrote:
> > >
> > > Searching for arguments in
> > > favor of opt-in style, I asked for help from another person who has
> been
> > > managing the setup of highload systems for decades. He promised to
> share his
> > > opinion next week.
> >
> > Given that we have one and half weeks before the feature freeze, I
> > think it's better to complete the project first before waiting for
> > his/her comments next week. Even if we finish this feature with the
> > opt-out style, we can hear more opinions on it and change the default
> > behavior as the change would be privial. What do you think?
> >
>
> Sure, if we can change the default value after the feature freeze, I don't
> mind leaving our parameter in opt-out style by now.
>
> > I've squashed all patches except for the documentation patch as I
> > assume you're working on it. The attached fixup patch contains several
> > changes: using opt-out style, comment improvements, and fixing typos
> > etc.
> >
>
> Thank you very much for the proposed fixes!
> I like the way you have changed nparallel_workers calculation
> (autovacuum.c).
> Forcing parallel workers to always read shared cost params at the first
> time
> is a good decision. All comments changes are also LGTM.
>
> The only place that I have changed is reloptions.c :
> As you have explained, it is not appropriate to use the "overrides" wording
> in the reloption's description, so I decided to return an old one.
>
> On Fri, Mar 27, 2026 at 10:54 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Wed, Mar 25, 2026 at 3:43 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> > >
> > > Given that we have one and half weeks before the feature freeze, I
> > > think it's better to complete the project first before waiting for
> > > his/her comments next week. Even if we finish this feature with the
> > > opt-out style, we can hear more opinions on it and change the default
> > > behavior as the change would be privial. What do you think?
> > >
> > > I've squashed all patches except for the documentation patch as I
> > > assume you're working on it. The attached fixup patch contains several
> > > changes: using opt-out style, comment improvements, and fixing typos
> > > etc.
> >
> > +1 for enabling this feature by default. When enough CPU is available,
> > vacuuming multiple indexes of a table in parallel in autovacuum
> > definitely speeds things up.
>
> Yes, for sure. But I have concerns that enabling parallel a/v for everyone
> will cause the parallel workers shortage during processing of the most huge
> tables.
>
> > Thank you for sending the latest patches. I quickly reviewed the v31
> > patches. Here are some comments.
> >
> > 1/ + {"autovacuum_parallel_workers", RELOPT_TYPE_INT,
> >
> > I haven't looked at the whole thread, but do we all think we need this
> > as a relopt? IMHO, we can wait for field experience and introduce this
> > later.
>
> I think that we should leave both reloption and the config parameter.
> Getting rid from the reloption will greatly reduce the ability of users to
> tune this feature. I'm afraid that this may lead to people not using
> parallel
> autovacuum.
>
> > I'm having a hard time finding a use-case where one wants to
> > disable the indexes at the table level. If there was already an
> > agreement, I agree to commit to that decision.
>
> You can read discussion from [1] to the current message in order to dive
> into
> the question.
>
> To make the long story short, I think that the most common use case for
> this
> feature is allowing parallel a/v for 2-3 tables, each of which has ~100
> indexes. The rest of the tables do not require parallel processing (at
> least
> it's a much lower priority for them).
>
> At the same time, Masahiko-san thinks that only the system should decide
> which
> tables will be processed in parallel. System's decision should be based on
> the
> number of indexes and a few other config parameters (e.g.
> min_parallel_index_scan_size). Thus, possibly many tables will be able to
> be
> processed in parallel.
>
> (Both opinions are pretty simplified).
>
> >
> > 2/ + /*
> > + * If 'true' then we are running parallel autovacuum. Otherwise, we
> are
> > + * running parallel maintenence VACUUM.
> > + */
> > + bool is_autovacuum;
> > +
> >
> > The variable name looks a bit confusing. How about we rely on
> > AmAutoVacuumWorkerProcess() and avoid the bool in shared memory?
>
> This variable is needed for parallel workers, which are taken from the
> bgworkers pool. I.e. AmAutovacuumWorker() will return 'false' for them.
> We need the "is_autovacuum" variable in order to understand exactly what
> this
> process was started for (VACUUM PARALLEL or parallel autovacuum).
>
>
> Thanks everyone for the review!
> Please, see an updated set of patches :
> As I promised, I created a dedicated chapter for Parallel Vacuum
> description.
> Both maintenance VACUUM and autovacuum now refer to this chapter.
>
> I am pretty inexperienced in the documentation writing, so forgive me if
> something is out of code style.
>
> [1]
> https://www.postgresql.org/message-id/CAJDiXggH1bW%3D4n%2B55CGLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw%40mail.gmail.com

Thank you for working on this, very useful feature. Sharing a few thoughts:

1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM
resources in parallel_vacuum_compute_workers?
2. Is it intentional that other autovacuum workers not yield cost limits to
the parallel auto vacuum workers? Cost limits are distributed first equally
to the autovacuum workers.
and then they share that. Therefore, parallel workers will be heavily
throttled. IIUC, this problem doesn't exist with manual vacuum.
If we don't fix this, at least we should document this.
3. Additionally, is there a point where, based on the cost limits,
launching additional workers becomes counterproductive compared to running
fewer workers and preventing it?
4. Would it make sense to add a table level override to disable parallelism
or set parallel worker count?

I ran some perf tests to show the improvements with parallel vacuum and
shared below.
System Configuration
--------------------
Hardware:
CPU: 16 cores
RAM: 128 GB
Storage: NVMe SSDs
OS: Ubuntu Linux

Workload Description
--------------------
Table: avtest
- 5,000,000 rows
- 9 columns: id (bigint PK), col1-col5 (int), col6 (text), col7
(timestamp),
padding (text, 50 bytes)
- 8 indexes:
avtest_pkey (col: id) 107 MB
idx_av_col7 (col: col7) 107 MB
idx_av_col2 (col: col2) 56 MB
idx_av_col4 (col: col4) 56 MB
idx_av_col5 (col: col5) 56 MB
idx_av_col1 (col: col1) 56 MB
idx_av_col3 (col: col3) 56 MB
idx_av_col6 (col: col6) 35 MB
- Total size: 1171 MB

Each test iteration:
1. Delete 2,000,000 rows (40%) using: DELETE WHERE id % 5 IN (1, 2)
2. CHECKPOINT to flush dirty pages
3. Trigger autovacuum by setting autovacuum_vacuum_threshold = 100 and
autovacuum_vacuum_scale_factor = 0 on the table
4. Wait for autovacuum to complete (detected via server log)
5. Re-insert the deleted rows and VACUUM to restore the table for the
next run

Test Methodology
----------------
Worker configurations tested: 0, 2, 4, 7 parallel workers
(7 is the maximum: nindexes - 1, since the leader always handles one
index)

Two experiments were run with different cost-based vacuum delay settings:

Experiment A: cost_limit=200, cost_delay=2ms
Experiment B: cost_limit=60, cost_delay=2ms

Common server settings for both experiments:
shared_buffers = 120 GB (entire dataset fits in shared buffers)
maintenance_work_mem = 1 GB
max_wal_size = 100 GB (prevents checkpoints during vacuum)
min_wal_size = 10 GB
checkpoint_timeout = 1 hour (prevents time-based checkpoints)
wal_buffers = 128 MB
max_parallel_workers = 16
max_worker_processes = 24
autovacuum_naptime = 1s

Between every single run:
1. PostgreSQL server is fully stopped (pg_ctl stop -m fast)
2. OS page cache is dropped (echo 3 > /proc/sys/vm/drop_caches)
3. Server is restarted with a clean log file
4. After DELETE and CHECKPOINT, the server is stopped again, OS caches
dropped again, and the server restarted -- so vacuum starts fully cold
5. The autovacuum_max_parallel_workers GUC is reloaded via pg_ctl reload

Each configuration was tested for 5 iterations.

Timing is extracted from the PostgreSQL server log "system usage" line that
autovacuum emits at completion. This reports elapsed wall-clock time and CPU
time for the autovacuum worker leader process.

Results: Experiment A (cost_limit=200, cost_delay=2ms)
------------------------------------------------------

Workers Iter1 Iter2 Iter3 Iter4 Iter5 Avg(s) Speedup
------- ------ ------ ------ ------ ------ ------ -------
0 66.21 79.11 66.27 77.11 66.30 71.00 1.00x
2 66.55 53.27 52.66 55.74 55.71 56.78 1.25x
4 51.50 51.74 65.07 52.06 70.25 58.12 1.22x
7 50.05 50.35 50.04 50.12 50.07 50.12 1.41x

CPU usage (leader process only):
Workers Avg CPU user Avg CPU sys
------- ----------- ----------
0 3.04s 1.70s
2 1.24s 1.50s
4 0.78s 1.49s
7 0.79s 1.48s

Results: Experiment B (cost_limit=60, cost_delay=2ms)
-----------------------------------------------------

Workers Iter1 Iter2 Iter3 Iter4 Iter5 Avg(s) Speedup
------- ------ ------ ------ ------ ------ ------ -------
0 199.00 195.26 191.44 191.90 191.67 193.85 1.00x
2 160.68 181.33 176.85 167.84 159.47 169.23 1.14x
4 154.02 165.02 174.33 164.16 156.53 162.81 1.19x
7 148.49 158.68 160.66 154.37 149.20 154.28 1.25x

CPU usage (leader process only):
Workers Avg CPU user Avg CPU sys
------- ----------- ----------
0 3.06s 1.90s
2 1.28s 1.72s
4 0.80s 1.69s
7 0.82s 1.68s

*Observations:*

1. Parallel autovacuum provides consistent speedup. With cost_limit=200 and
7 workers, vacuum completes 1.41x faster (71s -> 50s). With
cost_limit=60,
the speedup is 1.25x (194s -> 154s).
2. I see the benefit comes from parallelizing index vacuum. With 8 indexes
totaling
~530 MB, parallel workers scan indexes concurrently instead of the leader
scanning them one by one. The leader's CPU user time drops from ~3s to
~0.8s as index work is offloaded

Thanks,
Satya

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-03-30 00:30:08 Re: Avoid multiple SetLatch() calls in procsignal_sigusr1_handler()
Previous Message SATYANARAYANA NARLAPURAM 2026-03-29 23:51:27 Re: Add max_wal_replay_size connection parameter to libpq