Quick Links

Re: POC: Parallel processing of indexes in autovacuum

From:	Daniil Davydov <3danissimo(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: POC: Parallel processing of indexes in autovacuum
Date:	2026-03-04 06:58:49
Message-ID:	CAJDiXghjZEAYboGhujgGvY9=RiFD01ERHVVF+NQMuuAKVZDmDQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On Tue, Mar 3, 2026 at 5:26 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Mar 1, 2026 at 6:46 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
> >
> > Thus, a/v leader cannot launch any workers if max_parallel_workers is set to 0.
>
> Right. But this fact would actually support that limiting
> autovacuum_max_parallel_workers by max_parallel_workers is more
> appropriate, no?
>

av_max_parallel_workers is really limited by max_parallel_workers only
during shmem init. After that we can change it to a value that is higher
than max_parallel_workers, and nothing bad will happen (obviously).

So, my point was : why should we have this explicit limitation if it
1) doesn't guard us from something bad and 2) can be violated at any time
(via ALTER SYSTEM SET ...).

Now it seems to me that limiting our parameter by max_parallel_workers is
more about grouping of logically related parameters, not a practical necessity.

> > Even if there is a bug in the code and a/v leader cannot release parallel
> > workers due to occured error, one day it will finish vacuuming and call
> > "proc_exit". During "proc_exit" the "before_shmem_exit_hook" along with
> > the "ReleaseAllParallelWorkers" will be called.
>
> What bugs are you concerned about in this case? I'm not sure what you
> meant by "a/v leader cannot release parallel workers due to occured
> error". It sounds like you mentioned a case where there is a bug in
> AutoVacuumReleaseParallelWorkers() but if there is the bug and the
> leader failed to release parallel workers, we would end up not writing
> these elogs in either case.
>

Not precisely. I mean a bug that causes a/v leader to not call
AutoVacuumReleaseParallelWorkers in the try/catch block.
I'll continue my thoughts below.

> > I suppose to do the same as we did for try/catch block - add logging inside
> > the "autovacuum_worker_before_shmem_exit" with some unique message.
> > Thus, we will be sure that the workers are released precisely in the
> > "before_shmem_exit_hook".
> >
> > The alternative is to pass some additional information to the
> > "ReleaseAllParallelWorkers" function (to supplement the log it emits), but it
> > doesn't seem like a good solution to me.
>
> I'm not sure if it's important to check how
> AutoVacuumReleaseAllParallelWorkers() has been called (either in
> PG_CATCH() block or by autovacuum_worker_before_shmem_exit()). We
> would end up having to add a unique message to each caller of
> AutoVacuumReleaseAllParallelWorkers() in the future. I guess it's more
> important to make sure that all workers have been released in the end.
>
> In that sense, it would make more sense to check that all workers have
> actually been released (i.e., checking by
> get_parallel_autovacuum_free_workers()) after a parallel vacuum
> instead of checking workers being released by debug logs. That is, we
> can check at each test end if get_parallel_autovacuum_free_workers()
> returns the expected number after disabling parallel autovacuum.
>

Sure, at first we want to check whether all workers have been
released. But the ability to release them precisely in the try/catch
block is also important, because if it doesn't - a/v worker can "hold"
these workers until it finishes vacuuming of other tables (which can
take a lot of time). Such a situation will surely degrade performance,
so I think that we must check whether we can release workers precisely
during ERROR handling. Do you agree with it?

I understand your concerns about adding a unique log message for each
ReleaseAll call. But I cannot imagine a new situation when we need to
emergency release workers. If you think that it might be possible, I can
propose adding a new optional parameter to the "ReleaseAll" function -
something like "char *context_msg", which will be added to the elog placed
inside this function.

> On second thoughts on the "planned" and "reserved", can we consider
> what the patch implemented as "reserved" as the "planned" in
> autovacuum cases? That is, in autovacuum cases, the "planned" number
> considers the number of parallel degrees based on the number of
> indexes (or autovacuum_parallel_workers value) as well as the number
> of workers that have actually been reserved. In cases of
> autovacuum_max_parallel_workers shortage, users would notice by seeing
> logs that enough workers are not planned in the first place against
> the number of indexes on the table. That might be less confusing for
> users rather than introducing a new "reserved" concept in the vacuum
> logs. Also, it slightly helps simplify the codes.

Yeah, it sounds tempting. But in this case we're shifting more responsibility
to the user. For instance :
If av_max_workers = 5 and there are two a/v leaders each of which is trying
to launch 3 parallel workers, we will see logs like "3 planned, 3 launched",
"2 planned, 2 launched". IMHO, such a log doesn't imply that there is a
shortage of workers. I.e. this is the user's responsibility to notice that the
second a/v leader could launch more than 2 workers for processing of the
table with (N + 2) indexes.
In this case even our previous version of logging will give more information
to the user : "3 planned, 3 launched", "3 planned, 2 launched".

If we don't want to create a new "reserved" concept, maybe we can rename
it to something more intuitive? For example, "n_abandoned" - number of
workers that we were unable to launch due to av_max_parallel_workers
shortage. If n_abandoned is 0 and n_launched < n_planned, the user can
conclude that he should increase the max_parallel_workers parameter.
And vica versa, if n_launched == n_planned and n_abandoned > 0, the
user can conclude that he should increase the
autovacuum_max_parallel_workers parameter.

What do you think?

**Comments on the 0001 patch**

> * of the worker list (see above).
> @@ -299,6 +308,8 @@ typedef struct
> WorkerInfo av_startingWorker;
> AutoVacuumWorkItem av_workItems[NUM_WORKITEMS];
> pg_atomic_uint32 av_nworkersForBalance;
> + uint32 av_freeParallelWorkers;
> + uint32 av_maxParallelWorkers;
> } AutoVacuumShmemStruct;
>
> We should use int32 instead of uint32.

I don't mind, but I don't quite understand the reason. We assume that the
minimal value for both variables is 0. Why shouldn't we use unsigned
data type?

**Comments on the 0003 patch**

> I've attached the proposed changes to the 0003 patch, which includes:
>
> - removal of VacuumCostParams as it's not necessary.
> - comment updates.
> - other cosmetic updates.

Thank you! Most of the proposals are LGTM, but I'll edit a few comments.

**Comments on the 0004 patch**

> +#ifdef USE_INJECTION_POINTS
> + /*
> + * If we are parallel autovacuum worker, we can consume delay parameters
> + * during index processing (via vacuum_delay_point call). This logging
> + * allows tests to ensure this.
> + */
> + if (shared->is_autovacuum)
> + elog(DEBUG2,
> + "parallel autovacuum worker cost params: cost_limit=%d,
> cost_delay=%g, cost_page_miss=%d, cost_page_dirty=%d,
> cost_page_hit=%d",
> + vacuum_cost_limit,
> + vacuum_cost_delay,
> + VacuumCostPageMiss,
> + VacuumCostPageDirty,
> + VacuumCostPageHit);
> +#endif
>
> While it's true that we use these logs only during the regression
> tests that are enabled only when injection points are also enabled,
> these logs themselves are not related to the injection points. I'd
> recommend writing these logs when the worker refreshes its local delay
> parameters (i.e., in parallel_vacuum_update_shared_delay_params()).
>

I agree (thought about it too).

> +$node->append_conf('postgresql.conf', qq{
> + max_worker_processes = 20
> + max_parallel_workers = 20
> + max_parallel_maintenance_workers = 20
> + autovacuum_max_parallel_workers = 20
> + log_min_messages = debug2
> + log_autovacuum_min_duration = 0
> + autovacuum_naptime = '1s'
> + min_parallel_index_scan_size = 0
> + shared_preload_libraries=test_autovacuum
> +});
>
> It would be better to set log_autovacuum_min_duration = 0 to the
> specific table instead of setting globally.
>

I agree.

> + uint32 nfree_workers;
> +
> +#ifndef USE_INJECTION_POINTS
> + ereport(ERROR, errmsg("injection points not supported"));
> +#endif
> +
> + nfree_workers = AutoVacuumGetFreeParallelWorkers();
> +
> + PG_RETURN_UINT32(nfree_workers);
> +}
>
> As I commented above, I think we should use int32 for the number of
> parallel free workers. So let's change it here too.

No problem. But again, why do we avoid unsigned integer?

> +PG_FUNCTION_INFO_V1(get_parallel_autovacuum_free_workers);
> +Datum
> +get_parallel_autovacuum_free_workers(PG_FUNCTION_ARGS)
> +{
> + uint32 nfree_workers;
> +
> +#ifndef USE_INJECTION_POINTS
> + ereport(ERROR, errmsg("injection points not supported"));
> +#endif
> +
>
> I think we don't necessarily need to check the USE_INJECTION_POINTS in
> this function as we already have the check in the tap tests. The
> function itself is actually workable even without injection points.
>

I agree. It is left from the previous tests implementation.

> +# Copyright (c) 2024-2025, PostgreSQL Global Development Group
> +
>
> Please update the copyright year here too.

I keep forgetting about the meson file, sorry.

Thank you very much for the review!
Please, see an updated set of patches.

--
Best regards,
Daniil Davydov

Attachment	Content-Type	Size
v24-0004-Tests-for-parallel-autovacuum.patch	text/x-patch	20.7 KB
v24-0003-Cost-based-parameters-propagation-for-parallel-a.patch	text/x-patch	10.4 KB
v24-0002-Logging-for-parallel-autovacuum.patch	text/x-patch	10.2 KB
v24-0005-Documentation-for-parallel-autovacuum.patch	text/x-patch	4.4 KB
v24-0001-Parallel-autovacuum.patch	text/x-patch	19.4 KB
v23--v24-diff-for-0004.patch	text/x-patch	5.4 KB
v23--v24-diff-for-0003.patch	text/x-patch	10.4 KB
v23--v24-diff-for-0001.patch	text/x-patch	877 bytes

In response to

Re: POC: Parallel processing of indexes in autovacuum at 2026-03-02 22:25:39 from Masahiko Sawada

Responses

Re: POC: Parallel processing of indexes in autovacuum at 2026-03-10 18:13:34 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joel Jacobson	2026-03-04 07:04:38	Re: Improve hash join's handling of tuples with null join keys
Previous Message	Zhijie Hou (Fujitsu)	2026-03-04 06:56:40	RE: Improve pg_sync_replication_slots() to wait for primary to advance