Quick Links

Re: Parallel CREATE INDEX for GIN indexes

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Vinod Sridharan <vsridh90(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Parallel CREATE INDEX for GIN indexes
Date:	2025-04-30 12:39:08
Message-ID:	203656b9-fbb1-4643-821a-9cce1b7edd81@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 4/18/25 03:03, Vinod Sridharan wrote:
> Hello,
> As part of testing this change I believe I found a scenario where the
> parallel build seems to trigger OOMs for larger indexes. Specifically,
> the calls for ginEntryInsert seem to leak memory into
> TopTransactionContext and OOM/crash the outer process.
> For serial build, the calls for ginEntryInsert tend to happen in a
> temporary memory context that gets reset at the end of the
> ginBuildCallback.
> For inserts, the call has a custom memory context and gets reset at
> the end of the insert.
> For parallel build, during the merge phase, the MemoryContext isn't
> swapped - and so this happens on the TopTransactionContext, and ends
> up growing (especially for larger indexes).
>

Yes, that's true. The ginBuildCallbackParallel() already releases memory
after flushing the in-memory state, but I missed _gin_parallel_merge()
needs to be careful about memory usage too.

I haven't been able to trigger OOM (or even particularly bad) memory
usage, but I suppose it might be an issue with custom GIN opclasses with
much wider keys.

> I believe at the very least these should happen inside the tmpCtx
> found in the GinBuildState and reset periodically.
>
> In the attached patch, I've tried to do this, and I'm able to build
> the index without OOMing, and only consuming maintenance_work_mem
> through the merge process.
>
> Would appreciate your thoughts on this (and whether there's other approaches to
> resolve this too).
>

The patch seems fine to me - I repeated the tests with mailing list
archives, with MemoryContextStats() in _gin_parallel_merge, and it
reliably minimizes the memory usage. So that's fine.

I was also worried if this might have performance impact, but it
actually seems to make it a little bit faster.

I'll get this pushed.

thanks

--
Tomas Vondra

In response to

Re: Parallel CREATE INDEX for GIN indexes at 2025-04-18 01:03:07 from Vinod Sridharan

Responses

Re: Parallel CREATE INDEX for GIN indexes at 2025-05-02 21:15:52 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Daniel Gustafsson	2025-04-30 12:55:07	Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message	Amit Kapila	2025-04-30 12:11:32	Re: Add an option to skip loading missing publication to avoid logical replication failure