Re: Parallel heap vacuum

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel heap vacuum
Date: 2025-09-17 16:23:06
Message-ID: b0e02615-e8df-4e2b-8d09-781db475ed96@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/17/25 18:01, Robert Haas wrote:
> On Wed, Sep 17, 2025 at 7:25 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> I took a quick look at the patch this week. I don't have a very strong
>> opinion on the changes to table AM API, and I somewhat agree with this
>> impression. It's not clear to me why we should be adding callbacks that
>> are AM-specific (and only ever called from that particular AM) to the
>> common AM interface.
>
> We clearly should not do that.
>
>> I keep thinking about how we handle parallelism in index builds. The
>> index AM API did not get a bunch of new callbacks, it's all handled
>> within the existing ambuild() callback. Shouldn't we be doing something
>> like that for relation_vacuum()?
>
> I have a feeling that we might have made the wrong decision there.
> That approach will probably require a good deal of code to be
> duplicated for each AM. I'm not sure what the final solution should
> look like here, but we want the common parts like worker setup to use
> common code, while allowing each AM to insert its own logic in the
> places where that is needed. The challenge in my view is to figure out
> how best to arrange things so as to make that possible.
>

But a lot of the parallel-mode setup is already wrapped in some API (for
example LaunchParallelWorkers, WaitForParallelWorkersToAttach,
CreateParallelContext, ...).

I guess we might "invert" how the parallel builds work - invent a set of
callbacks / API an index AM would need to implement to support parallel
builds. And then those callbacks would be called from a single "parallel
index build" routine.

But I don't think there's a lot of duplicated code, at least based on my
experience with implementing parallel builds for BRIN and GIN.

Look at the BRIN code, for example. Most of the parallel stuff happens
in _brin_begin_parallel. Maybe more of it could be generalized a bit
more (some of the shmem setup?). But most of it is tied to the
AM-specific state / how parallel builds work for that particular AM.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2025-09-17 16:26:25 Re: Remove PointerIsValid()
Previous Message Nathan Bossart 2025-09-17 16:21:10 Re: Remove PointerIsValid()