| From: | Daniil Davydov <3danissimo(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: POC: Parallel processing of indexes in autovacuum |
| Date: | 2026-01-16 14:10:54 |
| Message-ID: | CAJDiXggL=J0nV7PfBsMW9+UOU3KUp1jNBM9Gov1JvAX7aG_U1g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Thu, Jan 15, 2026 at 9:13 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Thank you for updating the patches! Here are review comments.
>
Thank you for the review!
>
> +static void
> +autovacuum_worker_before_shmem_exit(int code, Datum arg)
> +{
> + if (code != 0)
> + AutoVacuumReleaseAllParallelWorkers();
> +
> + Assert(av_nworkers_reserved == 0);
> +}
>
> While adding the assertion here makes sense, the assertion won't work
> in non-assertion builds. I guess it's safer to call
> AutoVacuumReleaseAllParallelWorkers() regardless of the code to ensure
> that no autovacuum workers exit while holding parallel workers.
>
OK, I agree.
> ---
> + before_shmem_exit(autovacuum_worker_before_shmem_exit, 0);
>
> I think it would be better to set this callback later like before the
> main loop of processing the tables as it makes no sense even if we set
> it very early.
Yeah, agree. I'll also add a comment for it, because we already have a
"ReleaseAllParallelWorkers" function call in the try/catch block below.
>
> ---
> + /*
> + * Cap the number of free workers by new parameter's value, if needed.
> + */
> + AutoVacuumShmem->av_freeParallelWorkers =
> + Min(AutoVacuumShmem->av_freeParallelWorkers,
> + autovacuum_max_parallel_workers);
> +
> + if (autovacuum_max_parallel_workers > prev_max_parallel_workers)
> + {
> + /*
> + * If user wants to increase number of parallel autovacuum workers, we
> + * must increase number of free workers.
> + */
> + AutoVacuumShmem->av_freeParallelWorkers +=
> + (autovacuum_max_parallel_workers - prev_max_parallel_workers);
> + }
>
> Suppose the previous autovacuum_max_parallel_workers is 5 and there
> are 2 workers are reserved (i.e., there are 3 free parallel workers),
> if the autovacuum_max_parallel_workers changes to 2, the new
> AutoVacuumShmem->av_freeParallelWorkers would be 2 based on the above
> codes, but I believe that the new number of free workers should be 0
> as there are already 2 workers are running. What do you think? I guess
> we can calculate the new number of free workers by:
>
> Max((autovacuum_max_parallel_workers - prev_max_parallel_workers) +
> AutoVacuumShmem->av_freeParallelWorkers), 0)
>
If av_max_parallel_workers was changed to 2, then we not only set
freeParallelWorkers to 2 but also set maxParallelWorkers to 2.
Thus, when previously reserved two workers are released, av leader will
encounter this code:
/*
* If the maximum number of parallel workers was reduced during execution,
* we must cap available workers number by its new value.
*/
AutoVacuumShmem->av_freeParallelWorkers =
Min(AutoVacuumShmem->av_freeParallelWorkers + nworkers,
AutoVacuumShmem->av_maxParallelWorkers);
I.e. freeParallelWorkers will be left as "2".
The formula you suggested is also correct, but if you have no objections,
I would prefer not to change the existing logic. It seems more reliable for
me when av leader explicitly can consider such a situation.
> ---
> I've attached a patch proposing some minor changes.
>
Thanks! I agree with all fixes except a single one:
- * NOTE: We will try to provide as many workers as requested, even if caller
- * will occupy all available workers.
I think that this is a pretty important point. I'll leave this NOTE in the
v19 patch set. Do you mind?
>
> + /*
> + * Number of planned and actually launched parallel workers for all index
> + * scans, or NULL
> + */
> + PVWorkersUsage *workers_usage;
>
> I think that LVRelState can have PVWorkersUsage instead of a pointer to it.
>
Previously I used the NULL value of this pointer as a flag that we don't need
to log workers usage. Now I'll add boolean flag for this purpose (IIUC,
"nplanned > 0" condition is not enough to determine whether we should log
workers usage, because VACUUM PARALLEL can be called without VERBOSE).
> ---
> + /*
> + * Allocate space for workers usage statistics. Thus, we explicitly
> + * make clear that such statistics must be accumulated. For now, this
> + * is used only by autovacuum leader worker, because it must log it in
> + * the end of table processing.
> + */
> + vacrel->workers_usage = AmAutoVacuumWorkerProcess() ?
> + (PVWorkersUsage *) palloc0(sizeof(PVWorkersUsage)) :
> + NULL;
>
> I think we can report the worker statistics even in VACUUM VERBOSE
> logs. Currently VACUUM VERBOSE reports the worker usage just during
> index vacuuming but it would make sense to report the overall
> statistics in vacuum logs. It would help make VACUUM VERBOSE logs and
> autovacuum logs consistent.
>
Agree.
> But we don't need to report the worker usage if we didn't use the
> parallel vacuum (i.e., if npanned == 0).
>
As I wrote above - we don't need to log workers usage if the VERBOSE option
is not specified (even if nplanned > 0). Am I missing something?
> ---
> + /* Remember these values, if we asked to. */
> + if (wusage != NULL)
> + {
> + wusage->nlaunched += pvs->pcxt->nworkers_launched;
> + wusage->nplanned += nworkers;
> + }
>
> This code runs after the attempt to reserve parallel workers.
> Consequently, if we fail to reserve any workers due to
> autovacuum_max_parallel_workers, we report the status as if parallel
> vacuum wasn't planned at all. I think knowing the number of workers
> that were planned but not reserved would provide valuable insight for
> users tuning autovacuum_max_parallel_workers.
>
100% agree.
> ---
> + if (vacrel->workers_usage)
> + appendStringInfo(&buf,
> + _("parallel index vacuum/cleanup :
> workers planned = %d, workers launched = %d\n"),
> + vacrel->workers_usage->nplanned,
> + vacrel->workers_usage->nlaunched);
>
> Since these numbers are the total number of workers planned and
> launched, how about changing it to something "parallel index
> vacuum/cleanup: %d workers were planned and %d workers were launched
> in total"?
>
Agree.
>
> +typedef enum AVLeaderFaulureType
> +{
> + FAIL_NONE,
> + FAIL_ERROR,
> + FAIL_FATAL,
> +} AVLeaderFaulureType;
>
> I'm concerned that it is somewhat overwrapped with what injection
> points does as we can set 'error' to injection_points_attach(). For
> the FATAL error, we can terminate the autovacuum worker by using
> pg_terminate_backend() that keeps waiting due to
> injection_point_attach() with action='wait'.
>
Oh, I didn't know about the possibility of testing FATAL errors with
pg_terminate_backend. After reading your letter I found this pattern
in signal_autovacuum.pl. This is beautiful.
Thank you, I'll rework these tests.
> ---
> + /*
> + * Injection point to help exercising number of available parallel
> + * autovacuum workers.
> + */
> + INJECTION_POINT("autovacuum-set-free-parallel-workers-num",
> + &AutoVacuumShmem->av_freeParallelWorkers);
>
> This injection point is added to two places. IIUC the purpose of this
> function is to update the free_parallel_workers of InjPointState. And
> that value is taken by get_parallel_autovacuum_free_workers() SQL
> function during the TAP test. I guess it's better to have
> get_parallel_autovacuum_free_workers() function to direcly check
> av_freeParallelWorkers with a proper locking.
>
Agree.
> ---
> It would be great if we could test the av_freeParallelWorkers
> adjustment when max_parallel_maintenance_workers changes.
>
You mean "when autovacuum_max_parallel_workers changes"?
I'll add a test for it.
>
> * 0005 patch
>
> +typedef struct PVSharedCostParams
> +{
> + slock_t spinlock; /* protects all fields below */
> +
> + /* Copies of corresponding parameters from autovacuum leader process */
> + double cost_delay;
> + int cost_limit;
> +} PVSharedCostParams;
>
> Since Parallel workers don't reload the config file I think other
> vacuum delay related parameters such as VacuumCostPage{Miss|Hit|Dirty}
> also needs to be shared by the leader.
>
Yes, I remember it. I didn't add them in the previous patch because it was
experimental. I'll add all appropriate parameters in v19.
> ---
> + if (!AmAutoVacuumWorkerProcess())
> + {
> + /*
> + * If we are autovacuum parallel worker, check whether cost-based
> + * parameters had changed in leader worker.
> + * If so, vacuum_cost_delay and vacuum_cost_limit will be set to the
> + * values which leader worker is operating on.
> + *
> + * Do it before checking VacuumCostActive, because its value might be
> + * changed after leader's parameters consumption.
> + */
> + parallel_vacuum_fix_cost_based_params();
> + }
>
> We need to add checks to prevent the normal backend running the VACUUM
> command from calling parallel_vacuum_fix_cost_based_params().
>
We already have such check inside the "fix_cost_based" function :
/* Check whether we are running parallel autovacuum */
if (pv_shared_cost_params == NULL)
return false;
We also have this comment:
* If we are autovacuum parallel worker, check whether cost-based
* parameters had changed in leader worker.
As an alternative, I'll add comment explicitly saying that process will
immediately return if it not parallel autovacuum participant.
> IIUC autovacuum parallel workers would call
> parallel_vacuum_fix_cost_based_params() and update their
> vacuum_cost_{delay|limit} every vacuum_delay_point().
>
Yep.
> ---
> +/*
> + * Function to be called from parallel autovacuum worker in order to sync
> + * some cost-based delay parameter with the leader worker.
> + */
> +bool
> +parallel_vacuum_fix_cost_based_params(void)
> +{
>
> The 'fix' doesn't sound right to me as it's not broken actually. How
> about something like parallel_vacuum_update_shared_delay_params?
>
Agree.
> + Assert(IsParallelWorker() && !AmAutoVacuumWorkerProcess());
> +
> + SpinLockAcquire(&pv_shared_cost_params->spinlock);
> +
> + vacuum_cost_delay = pv_shared_cost_params->cost_delay;
> + vacuum_cost_limit = pv_shared_cost_params->cost_limit;
> +
> + SpinLockRelease(&pv_shared_cost_params->spinlock);
>
> IIUC autovacuum parallel workers seems to update their
> vacuum_cost_{delay|limit} every vacuum_delay_point(), which seems not
> good. Can we somehow avoid unnecessary updates?
More precisely, parallel worker *reads* leader's parameters every delay_point.
Obviously, this does not mean that the parameters will necessarily be updated.
But I don't see anything wrong with this logic. We just every time get the most
relevant parameters from the leader. Of course we can introduce some
signaling mechanism, but it will have the same effect as in the current code.
> ---
> +
> + if (vacuum_cost_delay > 0 && !VacuumFailsafeActive)
> + VacuumCostActive = true;
> +
>
> Should we consider the case of disabling VacuumCostActive as well?
>
I think that we should. I'll add VacuumUpdateCosts function call instead
of write this logic manually. IIUC, it will not break anything.
Again, thank you very much for the review!
Please, see v19 patches which including all above comments
and zengman's notice. Main changes :
1) Fixes for before_shmem_exit callback
2) Some comments reword + pgindent on all files
3) Workers usage can also be reported for VACUUM PARALLEL
4) Deeply reworked tests
5) Propagation (from leader to worker) of all cost-based delay parameters
I have also changed structure of the patch set - now test and documentation
are the last patches to be applied.
--
Best regards,
Daniil Davydov
| Attachment | Content-Type | Size |
|---|---|---|
| v19-0005-Documentation-for-parallel-autovacuum.patch | text/x-patch | 4.4 KB |
| v19-0003-Cost-based-parameters-propagation-for-parallel-a.patch | text/x-patch | 7.4 KB |
| v19-0001-Parallel-autovacuum.patch | text/x-patch | 19.4 KB |
| v19-0002-Logging-for-parallel-autovacuum.patch | text/x-patch | 8.4 KB |
| v19-0004-Tests-for-parallel-autovacuum.patch | text/x-patch | 23.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zharkov Roman | 2026-01-16 14:18:48 | Re: Selectively invalidate caches in pgoutput module |
| Previous Message | Hayato Kuroda (Fujitsu) | 2026-01-16 13:50:18 | RE: Simplify code building the LR conflict messages |