PG18 GIN parallel index build crash - invalid memory alloc request size

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: PG18 GIN parallel index build crash - invalid memory alloc request size
Date: 2025-10-24 03:03:49
Message-ID: CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Testing PostgreSQL 18.0 on Debian from PGDG repo: 18.0-1.pgdg12+3 with
PostGIS 3.6.0+dfsg-2.pgdg12+1. Running the osm2pgsql workload to load the
entire OSM Planet data set in my home lab system.

I found a weird crash during the recently adjusted parallel GIN index
building code. There are 2 parallel workers spawning, one of them crashes
then everything terminates. This is one of the last steps in OSM loading,
I can reproduce just by trying the one statement again:

gis=# CREATE INDEX ON "public"."planet_osm_polygon" USING GIN (tags);
ERROR: invalid memory alloc request size 1113001620

I see that this area of the code was just being triaged during early beta
time in May, may need another round.

The table is 215 GB. Server has 128GB and only 1/3 is nailed down, there's
plenty of RAM available.

Settings include:
work_mem=1GB
maintenance_work_mem=20GB
shared_buffers=48GB
max_parallel_workers_per_gather = 8

Log files show a number of similarly big allocations working before then,
here's an example:

LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp161831.0.fileset/0.1",
size 1073741824
STATEMENT: CREATE INDEX ON "public"."planet_osm_polygon" USING BTREE
(osm_id)
ERROR: invalid memory alloc request size 1137667788
STATEMENT: CREATE INDEX ON "public"."planet_osm_polygon" USING GIN (tags)
CONTEXT: parallel worker

And another one to show the size at crash is a little different each time:
ERROR: Database error: ERROR: invalid memory alloc request size 1115943018

Hooked into the error message and it gave this stack trace:

#0 errfinish (filename=0x5646de247420
"./build/../src/backend/utils/mmgr/mcxt.c",
lineno=1174, funcname=0x5646de2477d0 <__func__.3>
"MemoryContextSizeFailure")
at ./build/../src/backend/utils/error/elog.c:476
#1 0x00005646ddb4ae9c in MemoryContextSizeFailure (
context=context(at)entry=0x56471ce98c90, size=size(at)entry=1136261136,
flags=flags(at)entry=0) at ./build/../src/backend/utils/mmgr/mcxt.c:1174
#2 0x00005646de05898d in MemoryContextCheckSize (flags=0, size=1136261136,
context=0x56471ce98c90) at
./build/../src/include/utils/memutils_internal.h:172
#3 MemoryContextCheckSize (flags=0, size=1136261136,
context=0x56471ce98c90)
at ./build/../src/include/utils/memutils_internal.h:167
#4 AllocSetRealloc (pointer=0x7f34f558b040, size=1136261136, flags=0)
at ./build/../src/backend/utils/mmgr/aset.c:1203
#5 0x00005646ddb701c8 in GinBufferStoreTuple (buffer=0x56471cee0d10,
tup=0x7f34dfdd2030) at
./build/../src/backend/access/gin/gininsert.c:1497
#6 0x00005646ddb70503 in _gin_process_worker_data (progress=<optimized
out>,
worker_sort=0x56471cf13638, state=0x7ffc288b0200)
at ./build/../src/backend/access/gin/gininsert.c:1926
#7 _gin_parallel_scan_and_build (state=state(at)entry=0x7ffc288b0200,
ginshared=ginshared(at)entry=0x7f4168a5d360,
sharedsort=sharedsort(at)entry=0x7f4168a5d300, heap=heap(at)entry
=0x7f41686e5280,
index=index(at)entry=0x7f41686e4738, sortmem=<optimized out>,
progress=<optimized out>) at
./build/../src/backend/access/gin/gininsert.c:2046
#8 0x00005646ddb71ebf in _gin_parallel_build_main (seg=<optimized out>,
toc=0x7f4168a5d000) at
./build/../src/backend/access/gin/gininsert.c:2159
#9 0x00005646ddbdf882 in ParallelWorkerMain (main_arg=<optimized out>)
at ./build/../src/backend/access/transam/parallel.c:1563
#10 0x00005646dde40670 in BackgroundWorkerMain (startup_data=<optimized
out>,
startup_data_len=<optimized out>)
at ./build/../src/backend/postmaster/bgworker.c:843
#11 0x00005646dde42a45 in postmaster_child_launch (
child_type=child_type(at)entry=B_BG_WORKER, child_slot=320,
startup_data=startup_data(at)entry=0x56471cdbc8f8,
startup_data_len=startup_data_len(at)entry=1472,
client_sock=client_sock(at)entry=0x0)
at ./build/../src/backend/postmaster/launch_backend.c:290
#12 0x00005646dde44265 in StartBackgroundWorker (rw=0x56471cdbc8f8)
at ./build/../src/backend/postmaster/postmaster.c:4157
#13 maybe_start_bgworkers () at
./build/../src/backend/postmaster/postmaster.c:4323
#14 0x00005646dde45b13 in LaunchMissingBackgroundProcesses ()
at ./build/../src/backend/postmaster/postmaster.c:3397
#15 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1717
#16 0x00005646dde47f6d in PostmasterMain (argc=argc(at)entry=5,
argv=argv(at)entry=0x56471cd66dc0)
at ./build/../src/backend/postmaster/postmaster.c:1400
#17 0x00005646ddb4d56c in main (argc=5, argv=0x56471cd66dc0)
at ./build/../src/backend/main/main.c:227

I've frozen my testing at the spot where I can reproduce the problem. I
was going to try dropping m_w_m next and turning off the parallel
execution. I didn't want to touch anything until after asking if there's
more data that should be collected from a crashing instance.

--
Greg Smith, Software Engineering
Snowflake - Where Data Does More
gregory(dot)smith(at)snowflake(dot)com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-10-24 03:16:19 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Peter Smith 2025-10-24 02:40:03 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart