Re: Parallel CREATE INDEX for BRIN indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for BRIN indexes
Date: 2023-07-06 14:13:52
Message-ID: 2d7c64de-716e-346e-b01e-03db0c2bc5ac@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/5/23 16:33, Matthias van de Meent wrote:
> ...
>
>> Maybe. I wasn't that familiar with what parallel tuplesort can and can't
>> do, and the little I knew I managed to forget since I wrote this patch.
>> Which similar features do you have in mind?
>
> I was referring to the feature that is "emitting a single sorted run
> of tuples at the leader backend based on data gathered in parallel
> worker backends". It manages the sort state, on-disk runs etc. so that
> you don't have to manage that yourself.
>
> Adding a new storage format for what is effectively a logical tape
> (logtape.{c,h}) and manually merging it seems like a lot of changes if
> that functionality is readily available, standardized and optimized in
> sortsupport; and adds an additional place to manually go through for
> disk-related changes like TDE.
>

Here's a new version of the patch, with three main changes:

1) Adoption of the parallel scan approach, instead of the homegrown
solution with a sequence of TID scans. This is mostly what the 0002
patch did, except for fixing a bug - parallel scan has a "rampdown"
close to the end, and this needs to consider the chunk size too.

2) Switches to the parallel tuplesort, as proposed. This turned out to
be easier than I expected - most of the work was in adding methods to
tuplesortvariants.c to allow reading/writing BrinTuple items. The main
limitation is that we need to pass around the length of the tuple
(AFAICS it's not in the BrinTuple itself). I'm not entirely sure about
the memory management aspect of this, and maybe there's a more elegant
solution.

Overall it seems to work - the brin.c code is heavily based on how
nbtsearch.c does parallel builds for btree, so hopefully it's fine. At
some point I got a bit confused about which spool to create/use, but it
seems to work.

3) Handling of empty ranges - I ended up ignoring empty ranges in
workers (i.e. those are not written to the tuplesort), and instead the
leader fills them in when reading data from the shared tuplesort.

One thing I was wondering about is whether it might be better to allow
the workers to process overlapping ranges, and then let the leader to
merge the summaries. That would mean we might not need the tableam.c
changes at all, but the leader would need to do more work (although the
BRIN indexes tend to be fairly small). The main reason that got me
thinking about this is that we have pretty much no tests for the union
procedures, because triggering that is really difficult. But for
parallel index builds that'd be much more common.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-parallel-CREATE-INDEX-for-BRIN-20230706.patch text/x-patch 49.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-07-06 14:21:31 Re: Schema variables - new implementation for Postgres 15
Previous Message Tom Lane 2023-07-06 14:02:44 Re: UUID v7