| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Daniil Davydov <3danissimo(at)gmail(dot)com> |
| Cc: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: POC: Parallel processing of indexes in autovacuum |
| Date: | 2026-03-31 21:19:41 |
| Message-ID: | CAD21AoAvZc6Rwi1hZ7x+U3vz7AMMSpcbQ2JBn6+WmQp-3yfKMg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Mar 31, 2026 at 7:18 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Mar 31, 2026 at 2:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've made some changes to the documentation part, merged two patches
> > into one, and updated the commit message. Please review the attached
> > patch.
> >
>
> Great, thank you very much!
>
> Again, I don't know how to write the documentation well, so you can ignore
> my comments :
>
> > + <command>VACUUM</command> can perform index vacuuming and index cleanup
> Don't we need to mention autovacuum here too? I thought that VACUUM in the
> context means "manual VACUUM command".
I think that the documentation explains that the autovacuum daemon is
a worker automatically executing VACUUM and ANALYZE commands.
>
> > + ...applies specifically to the index vacuuming and index cleanup phases...
> Maybe we can refer to "vacuum-phases" here?
Agreed.
>
> All other changes look good to me.
>
> !!!
> > Searching for arguments in
> > favor of opt-in style, I asked for help from another person who has been
> > managing the setup of highload systems for decades. He promised to share his
> > opinion next week.
>
> I talked to Anton Doroshkevich today.
Thank you for sharing!
> He confirmed that as a rule there are *hundreds of thousands* of tables in the
> system, the vast majority of which do not need to be vacuumed in parallel mode.
I'm still struggling to see the technical justification; why would a
user want to avoid parallel vacuuming on eligible tables if they have
already explicitly allowed the system to use more resources by setting
autovacuum_max_parallel_workers to >0? If resource contention occurs,
it is typically a sign that the global parameters need re-tuning. As I
mentioned, the same contention can occur even with an opt-in style if
multiple tables are manually configured.
Also, I'm concerned that opt-in style could confuse users since
parallel vacuum is enabled by default in VACUUM command.
> He also suggested the following : let the reloption overlap the value of the
> GUC parameter. I.e. even if av_max_parallel_workers parameters is 0 the user
> still can set the av_parallel_workers to 10 for some table, and autovacuum
> will process this table in parallel.
>
> I remember that you want to use the GUC parameter as a global switch, and this
> approach will break this logic. But according to Anton's words, it is okay if
> the GUC parameter cannot disable parallel a/v for all tables instantly. It will
> become an administrator's responsibility to manually turn off parallel a/v for
> several tables (again, it is completely OK). Thus, this feature can be handy
> for all use cases.
While some autovacuum parameters do override GUCs, those are typically
local to the process (like cost delay). Parallel workers, however, are
a shared system-wide resource. In a multi-tenant environment, allowing
a single table's reloption to bypass the
autovacuum_max_parallel_workers = 0 limit could lead to unexpected
exhaustion of the worker pool. I think that this GUC should act as a
reliable global switch for resource management.
> I hope it doesn't look like as an adapting to the needs of a specific user.
> A lot of super-large productions are migrating to postgres now, and I believe
> that we should ensure their comfort too.
I'm not prioritizing one specific use case over another. I believe
that there are also users who want to use parallel vacuum on hundreds
of thousands of tables. We should consider a better solution while
checking it from multiple perspectives such as the usability, the
robustness and consistency with the existing features and behaviors
etc.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2026-03-31 21:25:44 | Re: Shared hash table allocations |
| Previous Message | Melanie Plageman | 2026-03-31 20:59:14 | Re: AIO / read stream heuristics adjustments for index prefetching |