Re: POC: Parallel processing of indexes in autovacuum

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Daniil Davydov <3danissimo(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: 2025-05-06 00:21:07
Message-ID: CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sat, May 3, 2025 at 1:10 AM Daniil Davydov <3danissimo(at)gmail(dot)com>
> wrote:
> >
> > On Sat, May 3, 2025 at 5:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> > >
> > > > In current implementation, the leader process sends a signal to the
> > > > a/v launcher, and the launcher tries to launch all requested workers.
> > > > But the number of workers never exceeds `autovacuum_max_workers`.
> > > > Thus, we will never have more a/v workers than in the standard case
> > > > (without this feature).
> > >
> > > I have concerns about this design. When autovacuuming on a single
> > > table consumes all available autovacuum_max_workers slots with
> > > parallel vacuum workers, the system becomes incapable of processing
> > > other tables. This means that when determining the appropriate
> > > autovacuum_max_workers value, users must consider not only the number
> > > of tables to be processed concurrently but also the potential number
> > > of parallel workers that might be launched. I think it would more make
> > > sense to maintain the existing autovacuum_max_workers parameter while
> > > introducing a new parameter that would either control the maximum
> > > number of parallel vacuum workers per autovacuum worker or set a
> > > system-wide cap on the total number of parallel vacuum workers.
> > >
> >
> > For now we have max_parallel_index_autovac_workers - this GUC limits
> > the number of parallel a/v workers that can process a single table. I
> > agree that the scenario you provided is problematic.
> > The proposal to limit the total number of supportive a/v workers seems
> > attractive to me (I'll implement it as an experiment).
> >
> > It seems to me that this question is becoming a key one. First we need
> > to determine the role of the user in the whole scheduling mechanism.
> > Should we allow users to determine priority? Will this priority affect
> > only within a single vacuuming cycle, or it will be more 'global'?
> > I guess I don't have enough expertise to determine this alone. I will
> > be glad to receive any suggestions.
>
> What I roughly imagined is that we don't need to change the entire
> autovacuum scheduling, but would like autovacuum workers to decides
> whether or not to use parallel vacuum during its vacuum operation
> based on GUC parameters (having a global effect) or storage parameters
> (having an effect on the particular table). The criteria of triggering
> parallel vacuum in autovacuum might need to be somewhat pessimistic so
> that we don't unnecessarily use parallel vacuum on many tables.

Perhaps we should only provide a reloption, therefore only tables specified
by the user via the reloption can be autovacuumed in parallel?

This gives a targeted approach. Of course if multiple of these allowed
tables
are to be autovacuumed at the same time, some may not get all the workers,
But that’s not different from if you are to manually vacuum in parallel the
tables
at the same time.

What do you think ?


Sami

>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2025-05-06 00:53:51 Re: Row pattern recognition
Previous Message Tomas Vondra 2025-05-06 00:12:09 Re: Improve hash join's handling of tuples with null join keys