From: | Daniil Davydov <3danissimo(at)gmail(dot)com> |
---|---|
To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: POC: Parallel processing of indexes in autovacuum |
Date: | 2025-07-06 08:00:32 |
Message-ID: | CAJDiXgiYiX+azuR76DcVx8fZn57m_4v6cB14-GW34mWa=qudFQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Fri, Jul 4, 2025 at 9:21 PM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> The "autovacuum_max_parallel_workers" declared on guc_tables.c mention
> that is capped by "max_worker_process":
> + {
> + {"autovacuum_max_parallel_workers", PGC_SIGHUP, VACUUM_AUTOVACUUM,
> + gettext_noop("Maximum number of parallel autovacuum workers, that can be taken from bgworkers pool."),
> + gettext_noop("This parameter is capped by \"max_worker_processes\" (not by \"autovacuum_max_workers\"!)."),
> + },
> + &autovacuum_max_parallel_workers,
> + 0, 0, MAX_BACKENDS,
> + check_autovacuum_max_parallel_workers, NULL, NULL
> + },
>
> IIUC the code, it cap by "max_worker_process", but Masahiko has mention
> on [1] that it should be capped by max_parallel_workers.
>
Thanks for looking into it!
To be honest, I don't think that this parameter should be explicitly
capped at all.
Other parallel operations (for example parallel index build or VACUUM
PARALLEL) just request as many workers as they want without looking at
'max_parallel_workers'.
And they will not complain, if not all requested workers were launched.
Thus, even if 'autovacuum_max_parallel_workers' is higher than
'max_parallel_workers' the worst that can happen is that not all
requested workers will be running (which is a common situation).
Users can handle it by looking for logs like "planned vs. launched"
and increasing 'max_parallel_workers' if needed.
On the other hand, obviously it doesn't make sense to request more
workers than 'max_worker_processes' (moreover, this parameter cannot
be changed as easily as 'max_parallel_workers').
I will keep the 'max_worker_processes' limit, so autovacuum will not
waste time initializing a parallel context if there is no chance that
the request will succeed.
But it's worth remembering that actually the
'autovacuum_max_parallel_workers' parameter will always be implicitly
capped by 'max_parallel_workers'.
What do you think about it?
> But the postgresql.conf.sample say that it is limited by
> max_parallel_workers:
> +#autovacuum_max_parallel_workers = 0 # disabled by default and limited by max_parallel_workers
Good catch, I'll fix it.
> ---
>
> We actually capping the autovacuum_max_parallel_workers by
> max_worker_process-1, so we can't have 10 max_worker_process and 10
> autovacuum_max_parallel_workers. Is that correct?
Yep. The explanation can be found just above in this letter.
> ---
>
> Locking unnecessary the AutovacuumLock if none if the if's is true can
> cause some performance issue here? I don't think that this would be a
> serious problem because this code will only be called if the
> configuration file is changed during the autovacuum execution right? But
> I could be wrong, so just sharing my thoughts on this (still learning
> about [auto]vacuum code).
>
> +
> +/*
> + * Make sure that number of available parallel workers corresponds to the
> + * autovacuum_max_parallel_workers parameter (after it was changed).
> + */
> +static void
> +check_parallel_av_gucs(int prev_max_parallel_workers)
> +{
> + LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
> +
> + if (AutoVacuumShmem->av_available_parallel_workers >
> + autovacuum_max_parallel_workers)
> + {
> + Assert(prev_max_parallel_workers > autovacuum_max_parallel_workers);
> +
>
This function may be called by a/v launcher when we already have some
a/v workers running.
A/v workers can change the
AutoVacuumShmem->av_available_parallel_workers value, so I think we
should acquire appropriate lock before reading it.
> Typo on "exeed"
>
> + /*
> + * Number of available workers must not exeed limit.
> + *
> + * Note, that if some parallel autovacuum workers are running at this
> + * moment, available workers number will not exeed limit after releasing
> + * them (see ParallelAutoVacuumReleaseWorkers).
> + */
Oops. I'll fix it.
> ---
>
> I'm not seeing an usage of this macro?
> +/*
> + * RelationGetParallelAutovacuumWorkers
> + * Returns the relation's parallel_autovacuum_workers reloption setting.
> + * Note multiple eval of argument!
> + */
> +#define RelationGetParallelAutovacuumWorkers(relation, defaultpw) \
> + ((relation)->rd_options ? \
> + ((StdRdOptions *) (relation)->rd_options)->autovacuum.parallel_autovacuum_workers : \
> + (defaultpw))
> +
>
Yes, this is the relic of a past implementation. I'll delete this macro.
>
> I've made some tests and I can confirm that is working correctly for
> what I can see. I think that would be to start include the documentation
> changes, what do you think?
>
It sounds tempting :)
But perhaps first we should agree on the limitation of the
'autovacuum_max_parallel_workers' parameter.
Please, see v6 patches :
1) Fixed typos in autovacuum.c and postgresql.conf.sample
2) Removed unused macro 'RelationGetParallelAutovacuumWorkers'
--
Best regards,
Daniil Davydov
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Parallel-index-autovacuum-with-bgworkers.patch | text/x-patch | 23.2 KB |
v6-0002-Sandbox-for-parallel-index-autovacuum.patch | text/x-patch | 8.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2025-07-06 08:29:45 | Re: Avoid possible dereference null pointer (contrib/postgres_fdw/postgres_fdw.c) |
Previous Message | Peter Geoghegan | 2025-07-06 02:19:10 | Adding support for Row Compares to nbtree startikey optimization |