Re: Parallel CREATE INDEX for BRIN indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for BRIN indexes
Date: 2023-11-29 14:42:37
Message-ID: CAEze2Wj5LneVaDgm1+43eHLGPxh_e7f23viA12cXm=k+igz5Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 28 Nov 2023 at 18:59, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 11/28/23 16:39, Matthias van de Meent wrote:
> > On Thu, 23 Nov 2023 at 14:35, Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >> On 11/23/23 13:33, Matthias van de Meent wrote:
> >>> The union operator may leak (lots of) memory, so I think it makes
> >>> sense to keep a context around that can be reset after we've extracted
> >>> the merge result.
> >>>
> >>
> >> But does the current code actually achieve that? It does create a "brin
> >> union" context, but then it only does this:
> >>
> >> /* Use our own memory context to avoid retail pfree */
> >> cxt = AllocSetContextCreate(CurrentMemoryContext,
> >> "brin union",
> >> ALLOCSET_DEFAULT_SIZES);
> >> oldcxt = MemoryContextSwitchTo(cxt);
> >> db = brin_deform_tuple(bdesc, b, NULL);
> >> MemoryContextSwitchTo(oldcxt);
> >>
> >> Surely that does not limit the amount of memory used by the actual union
> >> functions in any way?
> >
> > Oh, yes, of course. For some reason I thought that covered the calls
> > to the union operator function too, but it indeed only covers
> > deserialization. I do think it is still worthwhile to not do the
> > create/delete cycle, but won't hold the patch back for that.
> >
>
> I think the union_tuples() changes are better left for a separate patch.
>
> >>>> However, I don't think the number of union_tuples calls is likely to be
> >>>> very high, especially for large tables. Because we split the table into
> >>>> 2048 chunks, and then cap the chunk size by 8192. For large tables
> >>>> (where this matters) we're likely close to 8192.
> >>>
> >>> I agree that the merging part of the index creation is the last part,
> >>> and usually has no high impact on the total performance of the reindex
> >>> operation, but in memory-constrained environments releasing and then
> >>> requesting the same chunk of memory over and over again just isn't
> >>> great.
> >>
> >> OK, I'll take a look at the scratch context you suggested.
> >>
> >> My point however was we won't actually do that very often, because on
> >> large tables the BRIN ranges are likely smaller than the parallel scan
> >> chunk size, so few overlaps. OTOH if the table is small, or if the BRIN
> >> ranges are large, there'll be few of them.
> >
> > That's true, so maybe I'm concerned about something that amounts to
> > only marginal gains.
> >
>
> However, after thinking about this a bit more, I think we actually do
> need to do something about the memory management when merging tuples.
> AFAIK the general assumption was that union_tuple() only runs for a
> single range, and then the whole context gets freed.

Correct, but it is also is (or should be) assumed that union_tuple
will be called several times in the same context to fix repeat
concurrent updates. Presumably, that only happens rarely, but it's
something that should be kept in mind regardless.

> But the way the
> merging was implemented, it all runs in a single context. And while a
> single union_tuple() may not need a lot memory, in total it may be
> annoying. I just added a palloc(1MB) into union_tuples and ended up with
> ~160MB allocated in the PortalContext on just 2GB table. In practice the
> memory will grow more slowly, but not great :-/
>
> The attached 0003 patch adds a memory context that's reset after
> producing a merged BRIN tuple for each page range.

Looks good.

This also made me think a bit more about how we're working with the
tuples. With your latest patch, we always deserialize and re-serialize
the sorted brin tuples, just in case the next tuple will also be a
BRIN tuple of the same page range. Could we save some of that
deserialization time by optimistically expecting that we're not going
to need to merge the tuple and only store a local copy of it locally?
See attached 0002; this saves some cycles in common cases.

The v20231128 version of the patchset (as squashed, attached v5-0001)
looks good to me.

Kind regards,

Matthias van de Meent
Neon (http://neon.tech)

Attachment Content-Type Size
v5-0002-Reduce-de-forming-of-BRIN-tuples-in-parallel-BRIN.patch application/octet-stream 4.9 KB
v5-0001-Allow-BRIN-to-build-its-index-in-parallel.patch application/octet-stream 51.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-11-29 14:43:58 Re: proposal: possibility to read dumped table's name from file
Previous Message Tomas Vondra 2023-11-29 14:41:30 Re: logical decoding and replication of sequences, take 2