Quick Links

Re: PG18 GIN parallel index build crash - invalid memory alloc request size

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: PG18 GIN parallel index build crash - invalid memory alloc request size
Date:	2025-10-26 15:16:07
Message-ID:	d28e1299-d8d7-4c05-9a64-cc412f57026b@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/24/25 22:22, Gregory Smith wrote:
> On Fri, Oct 24, 2025 at 8:38 AM Tomas Vondra <tomas(at)vondra(dot)me
> <mailto:tomas(at)vondra(dot)me>> wrote:
>
> Hmmm, I wonder if the m_w_m is high enough to confuse the trimming logic
> in some way. Can you try if using smaller m_w_m (maybe 128MB-256MB)
> makes the issue go away?
>
>
> The index builds at up to 4GB of m_w_m. 5GB and above crashes.
>
> Now that I know roughly where the limits are that de-escalates things a
> bit. The sort of customers deploying a month after release should be OK
> with just knowing to be careful about high m_w_m settings on PG18 until
> a fix is ready.
>
> Hope everyone is enjoying Latvia. My obscure music collection includes
> a band from there I used to see in the NYC area, The Quags; https://
> www.youtube.com/watch?v=Bg3P4736CxM <https://www.youtube.com/watch?
> v=Bg3P4736CxM>
>

Nice!

> Can you show the contents of buffer and tup? I'm especially interested
> in these fields:
> buffer->nitems
> buffer->maxitems
> buffer->nfrozen
> tup->nitems
>
>
> I'll see if I can grab that data at the crash point.
>
> FYI for anyone who wants to replicate this: if you have a system with
> 128GB+ of RAM you could probably recreate the test case. Just have to
> download the Planet file and run osm2pgsql with the overly tweaked
> settings I use. I've published all the details of how I run this
> regression test now.
>
> Settings: https://github.com/gregs1104/pgbent/tree/main/conf/18/conf.d
> <https://github.com/gregs1104/pgbent/tree/main/conf/18/conf.d>
> Script setup: https://github.com/gregs1104/pgbent/blob/main/wl/osm-
> import <https://github.com/gregs1104/pgbent/blob/main/wl/osm-import>
> Test runner: https://github.com/gregs1104/pgbent/blob/main/util/osm-
> importer <https://github.com/gregs1104/pgbent/blob/main/util/osm-importer>
> Parse results: https://github.com/gregs1104/pgbent/blob/main/util/
> pgbench-init-parse <https://github.com/gregs1104/pgbent/blob/main/util/
> pgbench-init-parse>
>

I did reproduce this using OSM, although I used different settings, but
that's only affects loading. Setting maintenance_work_mem=20GB is more
than enough to trigger the error during parallel index build.

So I don't need the data.

>
> If I'm right, I think there are two ways to fix this:
> (1) apply the trimming earlier, i.e. try to freeze + flush before
> actually merging the data (essentially, update nfrozen earlier)
> (2) use repalloc_huge (and palloc_huge) in GinBufferStoreTuple
> Or we probably should do both.
>
>
> Sounds like (2) is probably mandatory and (1) is good hygiene.
>

Yes, (2) is mandatory to fix this, and it's also sufficient. See the
attached fix. I'll clean this up and push soon.

AFAICS (1) is not really needed. I was concerned we might end up with
each worker producing a TID buffer close to maintenance_work_mem, and
then the leader would have to use twice as much memory when merging. But
it turns out I already thought about that, and the workers use a fair
share or maintenance_work_mem, not a new limit. So they produce smaller
chunks, and those should not exceed maintenance_work_mem when merging.

I tried "freezing" the existing buffer more eagerly (before merging the
tuple), but that made no difference. The workers produce data with a lot
of overlaps (simply because that's how the parallel builds divide data),
and the amount of trimmed data is tiny. Something like 10k TIDs from a
buffer of 1M TIDs. So a tiny difference, and it'd still fail.

I'm not against maybe experimenting with this, but it's going to be a
mater-only thing, not for backpatching.

Maybe we should split the data into smaller chunks while building tuples
in ginFlushBuildState. That'd probably allow enforcing the memory limit
more strictly, because we sometimes hold multiple copies of the TIDs
arrays. But that's for master too.

regards

--
Tomas Vondra

Attachment	Content-Type	Size
gin-palloc-fix.patch	text/x-patch	848 bytes

In response to

Re: PG18 GIN parallel index build crash - invalid memory alloc request size at 2025-10-24 20:22:29 from Gregory Smith

Responses

Re: PG18 GIN parallel index build crash - invalid memory alloc request size at 2025-10-26 21:52:47 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Janes	2025-10-26 16:47:36	Should HashSetOp go away
Previous Message	Dmitry Dolgov	2025-10-26 11:16:14	Re: Bug in pg_stat_statements