Re: Parallel CREATE INDEX for BRIN indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for BRIN indexes
Date: 2023-07-05 14:33:30
Message-ID: CAEze2Whg43uK9g3CT_qWxWa2PjtcOU_eqTuxjBOOfNuzsPGAMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 5 Jul 2023 at 00:08, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
>
> On 7/4/23 23:53, Matthias van de Meent wrote:
> > On Thu, 8 Jun 2023 at 14:55, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >> Hi,
> >>
> >> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
> >> infrastructure (starting workers etc.) is "inspired" by the BTREE code
> >> (i.e. copied from that and massaged a bit to call brin stuff).
> >
> > Nice work.
> >
> >> In both cases _brin_end_parallel then reads the summaries from worker
> >> files, and adds them into the index. In 0001 this is fairly simple,
> >> although we could do one more improvement and sort the ranges by range
> >> start to make the index nicer (and possibly a bit more efficient). This
> >> should be simple, because the per-worker results are already sorted like
> >> that (so a merge sort in _brin_end_parallel would be enough).
> >
> > I see that you manually built the passing and sorting of tuples
> > between workers, but can't we use the parallel tuplesort
> > infrastructure for that? It already has similar features in place and
> > improves code commonality.
> >
>
> Maybe. I wasn't that familiar with what parallel tuplesort can and can't
> do, and the little I knew I managed to forget since I wrote this patch.
> Which similar features do you have in mind?

I was referring to the feature that is "emitting a single sorted run
of tuples at the leader backend based on data gathered in parallel
worker backends". It manages the sort state, on-disk runs etc. so that
you don't have to manage that yourself.

Adding a new storage format for what is effectively a logical tape
(logtape.{c,h}) and manually merging it seems like a lot of changes if
that functionality is readily available, standardized and optimized in
sortsupport; and adds an additional place to manually go through for
disk-related changes like TDE.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2023-07-05 14:45:07 Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns
Previous Message Andrey Lepikhov 2023-07-05 14:28:59 Re: Removing unneeded self joins