Quick Links

Re: POC: Parallel processing of indexes in autovacuum

From:	Daniil Davydov <3danissimo(at)gmail(dot)com>
To:	Sami Imseih <samimseih(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: POC: Parallel processing of indexes in autovacuum
Date:	2025-05-02 18:49:58
Message-ID:	CAJDiXgigcF3CMY86oREdQvxUDaUDFihkK9f78rdEyLTLeB0hdA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, May 2, 2025 at 11:58 PM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> I am generally -1 on the idea of autovacuum performing parallel
> index vacuum, because I always felt that the parallel option should
> be employed in a targeted manner for a specific table. if you have a bunch
> of large tables, some more important than others, a/c may end
> up using parallel resources on the least important tables and you
> will have to adjust a/v settings per table, etc to get the right table
> to be parallel index vacuumed by a/v.

Hm, this is a good point. I think I should clarify one moment - in
practice, there is a common situation when users have one huge table
among all databases (with 80+ indexes created on it). But, of course,
in general there may be few such tables.
But we can still adjust the autovac_idx_parallel_min_rows parameter.
If a table has a lot of dead tuples => it is actively used => table is
important (?).
Also, if the user can really determine the "importance" of each of the
tables - we can provide an appropriate table option. Tables with this
option set will be processed in parallel in priority order. What do
you think about such an idea?

>
> Also, with the TIDStore improvements for index cleanup, and the practical
> elimination of multi-pass index vacuums, I see this being even less
> convincing as something to add to a/v.

If I understood correctly, then we are talking about the fact that
TIDStore can store so many tuples that in fact a second pass is never
needed.
But the number of passes does not affect the presented optimization in
any way. We must think about a large number of indexes that must be
processed. Even within a single pass we can have a 40% increase in
speed.

>
> Now, If I am going to allocate extra workers to run vacuum in parallel, why
> not just provide more autovacuum workers instead so I can get more tables
> vacuumed within a span of time?

For now, only one process can clean up indexes, so I don't see how
increasing the number of a/v workers will help in the situation that I
mentioned above.
Also, we don't consume additional resources during autovacuum in this
patch - total number of a/v workers always <= autovacuum_max_workers.

BTW, see v2 patch, attached to this letter (bug fixes) :-)

--
Best regards,
Daniil Davydov

Attachment	Content-Type	Size
v2-0001-WIP-Allow-autovacuum-to-process-indexes-of-single.patch	text/x-patch	61.8 KB

In response to

Re: POC: Parallel processing of indexes in autovacuum at 2025-05-02 16:58:30 from Sami Imseih

Responses

Re: POC: Parallel processing of indexes in autovacuum at 2025-05-02 20:17:34 from Sami Imseih

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2025-05-02 18:52:36	Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message	Tom Lane	2025-05-02 18:46:50	Re: SQL functions: avoid making a tuplestore unnecessarily