SIGBUS coredumps in AllocSetContextCreateInternal()

From: "Peter 'PMc' Much" <pmc(at)citylink(dot)dinoex(dot)sub(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: SIGBUS coredumps in AllocSetContextCreateInternal()
Date: 2026-02-13 12:28:45
Message-ID: aY8Y_UHO-uajUnSa@disp.intra.daemon.contact
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

recently I got two postgres crashes on an installation that is
running for years already and without significant changes recently.

Postgres is 15.15, OS is FreeBSD 14.3

The crashes are SIGBUS, happening on different db-clusters running on
the same node from the same binary:

col: Feb 2 03:52:57 LOG: background worker "parallel worker" (PID 79324) was terminated by signal 10: Bus error
int: Feb 12 03:38:03 LOG: background worker "parallel worker" (PID 26340) was terminated by signal 10: Bus error

On the second occurrance I looked into the coredump (which is sparse
because this is a production build):

* thread #1, name = 'postgres', stop reason = signal SIGBUS
* frame #0: 0x0000000829930ac3 libc.so.7`___lldb_unnamed_symbol5890 + 131
frame #1: 0x000000082992da28 libc.so.7`___lldb_unnamed_symbol5865 + 504
frame #2: 0x000000082992e889 libc.so.7`___lldb_unnamed_symbol5871 + 2617
frame #3: 0x000000082990ca84 libc.so.7`___lldb_unnamed_symbol5446 + 644
frame #4: 0x000000082990c6b7 libc.so.7`___lldb_unnamed_symbol5445 + 839
frame #5: 0x0000000829952945 libc.so.7`___lldb_unnamed_symbol6064 + 21
frame #6: 0x0000000829900013 libc.so.7`___lldb_unnamed_symbol5410 + 755
frame #7: 0x00000000009c0577 postgres`AllocSetContextCreateInternal + 199
frame #8: 0x00000000006d588c postgres`ExecAssignExprContext + 108
frame #9: 0x00000000006faab9 postgres`ExecInitSeqScan + 73
frame #10: 0x00000000006cf188 postgres`ExecInitNode + 248
frame #11: 0x00000000006c8440 postgres`standard_ExecutorStart + 1056
frame #12: 0x00000000006cca12 postgres`ParallelQueryMain + 402
frame #13: 0x0000000000585f79 postgres`ParallelWorkerMain + 985
frame #14: 0x00000000007bc606 postgres`StartBackgroundWorker + 310
frame #15: 0x00000000007c1f00 postgres`maybe_start_bgworkers + 1104
frame #16: 0x00000000007c0a43 postgres`sigusr1_handler + 307
frame #17: 0x00000008228aa606 libthr.so.3`___lldb_unnamed_symbol688 + 214
frame #18: 0x00000008228a9b0a libthr.so.3`___lldb_unnamed_symbol669 + 314
frame #19: 0x0000000821a402d3
frame #20: 0x00000000007c2545 postgres`ServerLoop + 1605
frame #21: 0x00000000007bffa3 postgres`PostmasterMain + 3251
frame #22: 0x0000000000720601 postgres`main + 801
frame #23: 0x0000000829803190 libc.so.7`__libc_start1 + 304
frame #24: 0x00000000004ff4e4 postgres`_start + 36

I'm not sure what to make of this. A single crash might be due to
a cosmic ray or whatever, a second occurrance usually means there
is something wrong.

That function AllocSetContextCreateInternal() seems to do some
memory allocation. That somehow explains the SIGBUS event, and
shifts the balance more to a software issue instead of a hardware
issue.

Forensics from the logfiles tell me that in both cases, the only
running task that might use parallel workers, was a routine data
collection job that runs at least every night - and a different
one in both cases with no common parts, no special plugins used
or whatever, just plain SQL.

In between the two crashes, the postgres binaries were updated and
the system subsequently rebooted.

The system does not report any hardware issues, neither failures
in other applications running. Memory ECC exists, and does
actually work - I've seen that in the past.

The two clusters use different physical disks.

Postgres configuration is mostly as recommended. I was surprized to
find that we now use *three* different shared-memory allocation tools,
but the manual is clear about that:
* 4096 byte from SysV shm (visible with ipcs)
* shared buffers apparently from anonymous mmap() - nowhere visible
in the system
* dynamic shared buffers from Posix - these are visible with
posixshmcontrol.

Some sources say postgres would access the shared memory via handles
under /dev/shm. But this is not possible because /dev/shm does not
exist (by default on FreeBSD jails).

Furthermore, the manual says postgres uses "a significant number" of
semaphores, and that these are *not* SysV sem. They also are not
Posix, because these do not exist - one would need to build a
custom kernel to get them (according to "man 4 sem").

So far, this does not shed much light on the issue, except insofar
as the "dynamic shared memory" seems historically intended specifically
for parallel workers. One could assume a kind of coincidence, but
looking closer, there are always some of these Posix shm present, on
every cluster, and right from the start, parallel workers or not:
# posixshmcontrol list
MODE OWNER GROUP SIZE PATH
rw------- postgres postgres 30976 /PostgreSQL.1991522144
rw------- postgres postgres 2097152 /PostgreSQL.45072524
rw------- postgres postgres 1048576 /PostgreSQL.1450298

Here are my config adjustments so far as they might somehow relate
to memory allocation:

max_connections = 60 # (change requires restart)
shared_buffers = 40MB # min 128kB
temp_buffers = 20MB # min 800kB
work_mem = 50MB # min 64kB
maintenance_work_mem = 50MB # min 1MB
max_stack_depth = 40MB # min 100kB
dynamic_shared_memory_type = posix # the default is the first option
max_files_per_process = 200 # min 25
effective_io_concurrency = 5 # 1-1000; 0 disables prefetching
synchronous_commit = off # synchronization level;

-- PMc

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2026-02-13 14:18:02 Re: pg_restore failed on foreign key constraint
Previous Message Wim Rouquart 2026-02-13 08:08:29 RE: Index (primary key) corrupt?