Re: Parallel heap vacuum

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Parallel heap vacuum
Date: 2025-09-18 00:24:15
Message-ID: bf57277b-3e9f-48b0-95e0-e70c6a0fa855@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/18/25 01:18, Masahiko Sawada wrote:
> On Wed, Sep 17, 2025 at 4:25 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> On 9/8/25 17:40, Melanie Plageman wrote:
>>> On Wed, Aug 27, 2025 at 2:30 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>
>>>> On Tue, Aug 26, 2025 at 8:55 AM Melanie Plageman
>>>> <melanieplageman(at)gmail(dot)com> wrote:
>>>>
>>>>> If you do parallel worker setup below heap_vacuum_rel(), then how are
>>>>> you supposed to use those workers to do non-heap table vacuuming?
>>>>
>>>> IIUC non-heap tables can call parallel_vacuum_init() in its
>>>> relation_vacuum table AM callback implementation in order to
>>>> initialize parallel table vacuum, parallel index vacuum, or both.
>>>
>>> Yep, it's just that that is more like a library helper function.
>>> Currently, in master, the flow is:
>>>
>>> leader does:
>>> heap_vacuum_rel()
>>> dead_items_alloc()
>>> parallel_vacuum_init()
>>> passes parallel_vacuum_main() to parallel context
>>> lazy_scan_heap()
>>> ...
>>> parallel_vacuum_process_all_indexes()
>>> LaunchParallelWorkers()
>>> ...
>>> parallel_vacuum_process_[un]safe_indexes()
>>>
>>> For the leader, I don't remember where your patch moves
>>> LaunchParallelWorkers() -- since it would have to move to before the
>>> first heap vacuuming phase. When I was imagining what made the most
>>> sense to make parallelism agnostic of heap, it seemed like it would be
>>> moving LaunchParallelWorkers() above heap_vacuum_rel(). But I
>>> recognize that is a pretty big change, and I haven't tried it, so I
>>> don't know all the complications. (sounds like you mention some
>>> below).
>>>
>>> Basically, right now in master, we expose some functions for parallel
>>> index vacuuming that people can call in their own table implementation
>>> of table vacuuming, but we don't invoke any of them in table-AM
>>> agnostic code. That may work fine for just index vacuuming, but now
>>> that we are trying to make parts of the table vacuuming itself
>>> extensible, the interface feels more awkward.
>>>
>>> I won't push the idea further because I'm not even sure if it works,
>>> but the thing that felt the most natural to me was pushing parallelism
>>> above the table AM.
>>>
>>> ...
>>
>> I took a quick look at the patch this week. I don't have a very strong
>> opinion on the changes to table AM API, and I somewhat agree with this
>> impression. It's not clear to me why we should be adding callbacks that
>> are AM-specific (and only ever called from that particular AM) to the
>> common AM interface.
>
> I agree that we should not add AM-specific callbacks to the AM interface.
>
>> I keep thinking about how we handle parallelism in index builds. The
>> index AM API did not get a bunch of new callbacks, it's all handled
>> within the existing ambuild() callback. Shouldn't we be doing something
>> like that for relation_vacuum()?
>>
>> That is, each AM-specific relation_vacuum() would be responsible for
>> starting / stopping workers and all that. relation_vacuum would need
>> some flag indicating it's OK to use parallelism, of course. We might add
>> some shared infrastructure to make that more convenient, of course.
>>
>> This is roughly how index builds handle parallelism. It's not perfect
>> (e.g. the plan_create_index_workers can do unexpected stuff for
>> non-btree indexes).
>
> In parallel vacuum cases, I believe we need to consider that
> vacuumparallel.c (particularly ParallelVacuumContext) effectively
> serves as a shared infrastructure. It handles several core functions,
> including the initialization of shared dead items storage (shared
> TidStore), worker management, buffer/WAL usage, vacuum-delay settings,
> and provides APIs for parallel index vacuuming/cleanup. If we were to
> implement parallel heap vacuum naively in vacuumlazy.c, we would
> likely end up with significant code duplication.
>
> To avoid this redundancy, my idea is to extend vacuumparallel.c to
> support parallel table vacuum. This approach would allow table AMs to
> use vacuumparallel.c for parallel table vacuuming, parallel index
> vacuuming, or both. While I agree that adding new table AM callbacks
> for parallel table vacuum to the table AM interface isn't a good idea,
> I think passing the necessary callbacks for parallel heap vacuum to
> vacuumparallel.c through parallel_vacuum_init() follows a similar
> principle. This means that all information needed for parallel
> table/index vacuuming would remain within vacuumparallel.c, and table
> AMs could trigger parallel table vacuum, parallel index vacuuming, and
> parallel index cleanup through the APIs provided by vacuumparallel.c.
>
> The alternative approach, as you suggested, would be to have the table
> AM's relation_vacuum() handle the parallel processing implementation
> while utilizing the infrastructure and APIs provided by
> vacuumparallel.c. Based on the feedback received, I'm now evaluating
> what shared information and helper functions would be necessary for
> this latter approach.
>

OK

>> I also repeated the stress test / benchmark, measuring impact with
>> different number of indexes, amount of deleted tuples, etc. Attached is
>> a PDF summarizing the results for each part of the patch series (with
>> different number of workers). For this tests, the improvements are
>> significant only with no indexes - then the 0004 patch saves 30-50%. But
>> as soon as indexes are added, the index cleanup starts to dominate.
>>
>> It's just an assessment, I'm not saying we shouldn't parallelize the
>> heap cleanup. I assume there are workloads for which patches will make
>> much more difference. But, what would such cases look like? If I want to
>> maximize the impact of this patch series, what should I do?
>
> Another use case is executing vacuum without index cleanup for faster
> freezing XIDs on a large table.
>

Good point. I forgot about this case, where we skip index cleanup.

>>
>> FWIW the patch needs rebasing, there's a bit of bitrot. It wasn't very
>> extensive, so I did that (in order to run the tests), attached is the
>> result as v19.
>
> Thank you.
>
>> This also reminds I heard a couple complaints we don't allow parallelism
>> in autovacuum. We have parallel index vacuum, but it's disabled in
>> autovacuum (and users really don't want to run manual vacuum).
>>
>> That question is almost entirely orthogonal to this patch, of course.
>> I'm not suggesting this patch has (or even should) to do anything about
>> it. But I wonder if you have any thoughts on that?
>>
>> I believe the reason why parallelism is disabled in autovacuum is that
>> we want autovacuum to be a background process, with minimal disruption
>> to user workload. It probably wouldn't be that hard to allow autovacuum
>> to do parallel stuff, but it feels similar to adding autovacuum workers.
>> That's rarely the solution, without increasing the cost limit.
>
> A patch enabling autovacuum to use parallel vacuum capabilities has
> already been proposed[1], and I've been reviewing it. There seems to
> be consensus that users would benefit from parallel index vacuuming
> during autovacuum in certain scenarios. As an initial implementation,
> parallel vacuum in autovacuum will be disabled by default, but users
> can enable it by specifying parallel degrees through a new storage
> option for specific tables where they want parallel vacuuming.
>

Thanks, the v18 patch seems generally sensible, but I only skimmed it.
I'll try to take a closer look ...

I agree it probably makes sense to disable that by default. One thing
that bothers me is that this makes predicting autovacuum behavior even
more difficult (but that's not the fault of the patch).

> For your information, while the implementation itself is relatively
> straightforward, we're still facing one unresolved issue; the system
> doesn't support reloading the configuration file during parallel mode,
> but it is necessary for autovacuum to update vacuum cost delay
> parameters[2].
>

Hmmm, that's annoying :-(

If it happens only for max_stack_depth (when setting it based on
environment) maybe we could simply skip that when in parallel mode? But
it's ugly, and we probably will have the same issue for any other
GUC_ALLOW_IN_PARALLEL option in a file ...

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Koval 2025-09-18 00:25:47 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message Chao Li 2025-09-18 00:11:08 Re: Incorrect logic in XLogNeedsFlush()