Re: POC: Parallel processing of indexes in autovacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Daniil Davydov <3danissimo(at)gmail(dot)com>
Cc: Sami Imseih <samimseih(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: 2025-08-18 21:03:19
Message-ID: CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPKmrSDjJ3wZ8TN28vaCQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 18, 2025 at 1:31 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
>
>
> On Fri, Aug 15, 2025 at 3:41 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
>
> > 2. when an autovacuum worker (not parallel vacuum worker) who uses
> > parallel vacuum gets SIGHUP, it errors out with the error message
> > "parameter "max_stack_depth" cannot be set during a parallel
> > operation". Autovacuum checks the configuration file reload in
> > vacuum_delay_point(), and while reloading the configuration file, it
> > attempts to set max_stack_depth in
> > InitializeGUCOptionsFromEnvironment() (which is called by
> > ProcessConfigFileInternal()). However, it cannot change
> > max_stack_depth since the worker is in parallel mode but
> > max_stack_depth doesn't have GUC_ALLOW_IN_PARALLEL flag. This doesn't
> > happen in regular backends who are using parallel queries because they
> > check the configuration file reload at the end of each SQL command.
> >
>
> Hm, this is a really serious problem. I see only two ways to solve it (both are
> not really good) :
> 1)
> Do not allow processing of the config file during parallel autovacuum
> execution.
>
> 2)
> Teach the autovacuum to enter parallel mode only during the index vacuum/cleanup
> phase. I'm a bit wary about it, because the design says that we should
> be in parallel
> mode during the whole parallel operation. But actually, if we can make
> sure that all
> launched workers are exited, I don't see reasons, why can't we just
> exit parallel mode
> at the end of parallel_vacuum_process_all_indexes.
>
> What do you think about it?

Hmm, given that we're trying to support parallel heap vacuum on
another thread[1] and we will probably support it in autovacuums, it
seems to me that these approaches won't work.

Another idea would be to allow autovacuum workers to process the
config file even in parallel mode. GUC changes in the leader worker
would not affect parallel vacuum workers, but it is fine to me. In the
context of autovacuum, only specific GUC parameters related to
cost-based delays need to be affected also to parallel vacuum workers.
Probably we need some changes to compute_parallel_delay() so that
parallel workers can compute the sleep time based on the new
vacuum_cost_limit and vacuum_cost_delay after the leader process
(i.e., autovacuum worker) reloads the config file.

>
> Again, thank you for the review. Please, see v10 patches (only 0001
> has been changed) :
> 1) Reserve and release workers only inside parallel_vacuum_process_all_indexes.
> 2) Add try/catch block to the parallel_vacuum_process_all_indexes, so we can
> release workers even after an error. This required adding a static
> variable to account
> for the total number of reserved workers (av_nworkers_reserved).
> 3) Cap autovacuum_max_parallel_workers by max_worker_processes only inside
> autovacuum code. Assign hook has been removed.
> 4) Use shmem value for determining the maximum number of parallel autovacuum
> workers (eliminate race condition between launcher and leader process).

Thank you for updating the patch! I'll review the new version patches.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAEfCNv-GgaDheDJ%2Bs-p_Lv1H24AiJeNoPGCmZNSwL1YA%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-08-18 21:10:38 fix misspelling of "tranche" in dsa.h
Previous Message Sami Imseih 2025-08-18 20:10:42 Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations