From: | Daniil Davydov <3danissimo(at)gmail(dot)com> |
---|---|
To: | Sami Imseih <samimseih(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: POC: Parallel processing of indexes in autovacuum |
Date: | 2025-05-02 18:49:58 |
Message-ID: | CAJDiXgigcF3CMY86oREdQvxUDaUDFihkK9f78rdEyLTLeB0hdA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 2, 2025 at 11:58 PM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> I am generally -1 on the idea of autovacuum performing parallel
> index vacuum, because I always felt that the parallel option should
> be employed in a targeted manner for a specific table. if you have a bunch
> of large tables, some more important than others, a/c may end
> up using parallel resources on the least important tables and you
> will have to adjust a/v settings per table, etc to get the right table
> to be parallel index vacuumed by a/v.
Hm, this is a good point. I think I should clarify one moment - in
practice, there is a common situation when users have one huge table
among all databases (with 80+ indexes created on it). But, of course,
in general there may be few such tables.
But we can still adjust the autovac_idx_parallel_min_rows parameter.
If a table has a lot of dead tuples => it is actively used => table is
important (?).
Also, if the user can really determine the "importance" of each of the
tables - we can provide an appropriate table option. Tables with this
option set will be processed in parallel in priority order. What do
you think about such an idea?
>
> Also, with the TIDStore improvements for index cleanup, and the practical
> elimination of multi-pass index vacuums, I see this being even less
> convincing as something to add to a/v.
If I understood correctly, then we are talking about the fact that
TIDStore can store so many tuples that in fact a second pass is never
needed.
But the number of passes does not affect the presented optimization in
any way. We must think about a large number of indexes that must be
processed. Even within a single pass we can have a 40% increase in
speed.
>
> Now, If I am going to allocate extra workers to run vacuum in parallel, why
> not just provide more autovacuum workers instead so I can get more tables
> vacuumed within a span of time?
For now, only one process can clean up indexes, so I don't see how
increasing the number of a/v workers will help in the situation that I
mentioned above.
Also, we don't consume additional resources during autovacuum in this
patch - total number of a/v workers always <= autovacuum_max_workers.
BTW, see v2 patch, attached to this letter (bug fixes) :-)
--
Best regards,
Daniil Davydov
Attachment | Content-Type | Size |
---|---|---|
v2-0001-WIP-Allow-autovacuum-to-process-indexes-of-single.patch | text/x-patch | 61.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-05-02 18:52:36 | Re: [PoC] Federated Authn/z with OAUTHBEARER |
Previous Message | Tom Lane | 2025-05-02 18:46:50 | Re: SQL functions: avoid making a tuplestore unnecessarily |