| From: | Rahila Syed <rahilasyed90(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de> |
| Subject: | Segmentation fault on proc exit after dshash_find_or_insert |
| Date: | 2025-11-21 11:45:35 |
| Message-ID: | CAH2L28uSvyiosL+kaic9249jRVoQiQF6JOnaCitKFq=xiFzX3g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
If a process encounters a FATAL error after acquiring a dshash lock but
before releasing it,
and it is not within a transaction, it can lead to a segmentation fault.
The FATAL error causes the backend to exit, triggering proc_exit() and
similar functions.
In the absence of a transaction, LWLockReleaseAll() is delayed until
ProcKill. ProcKill is
an on_shmem_exit callback, and dsm_backend_shutdown() is called before any
on_shmem_exit callbacks are invoked.
Consequently, if a dshash lock was acquired before the FATAL error
occurred, the lock
will only be released after dsm_backend_shutdown() detaches the DSM segment
containing
the lock, resulting in a segmentation fault.
Please find a reproducer attached. I have modified the test_dsm_registry
module to create
a background worker that does nothing but throws a FATAL error after
acquiring the dshash lock.
The reason this must be executed in the background worker is to ensure it
runs without a transaction.
To trigger the segmentation fault, apply the 0001-Reproducer* patch, run
make install in the
test_dsm_registry module, specify test_dsm_registry as
shared_preload_libraries in postgresql.conf,
and start the server.
Please find attached a fix to call LWLockReleaseAll() early in the
shmem_exit() routine. This ensures
that the dshash lock is released before dsm_backend_shutdown() is called.
This will also ensure that
any subsequent callbacks invoked in shmem_exit() will not fail to acquire
any lock.
Please see the backtrace below.
```
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055a7515af56c in pg_atomic_fetch_sub_u32_impl (ptr=0x7f92c4b334f4,
sub_=262144)
at ../../../../src/include/port/atomics/generic-gcc.h:218
218 return __sync_fetch_and_sub(&ptr->value, sub_);
(gdb) bt
#0 0x000055a7515af56c in pg_atomic_fetch_sub_u32_impl (ptr=0x7f92c4b334f4,
sub_=262144)
at ../../../../src/include/port/atomics/generic-gcc.h:218
#1 0x000055a7515af625 in pg_atomic_sub_fetch_u32_impl (ptr=0x7f92c4b334f4,
sub_=262144)
at ../../../../src/include/port/atomics/generic.h:232
#2 0x000055a7515af709 in pg_atomic_sub_fetch_u32 (ptr=0x7f92c4b334f4,
sub_=262144)
at ../../../../src/include/port/atomics.h:441
#3 0x000055a7515b1583 in LWLockReleaseInternal (lock=0x7f92c4b334f0,
mode=LW_EXCLUSIVE) at lwlock.c:1840
#4 0x000055a7515b1638 in LWLockRelease (lock=0x7f92c4b334f0) at
lwlock.c:1902
#5 0x000055a7515b16e9 in LWLockReleaseAll () at lwlock.c:1951
#6 0x000055a7515ba63d in ProcKill (code=1, arg=0) at proc.c:953
#7 0x000055a7515913af in shmem_exit (code=1) at ipc.c:276
#8 0x000055a75159119b in proc_exit_prepare (code=1) at ipc.c:198
#9 0x000055a7515910df in proc_exit (code=1) at ipc.c:111
#10 0x000055a7517be71d in errfinish (filename=0x7f92ce41d062
"test_dsm_registry.c", lineno=187,
funcname=0x7f92ce41d160 <__func__.0> "TestDSMRegistryMain") at
elog.c:596
#11 0x00007f92ce41ca62 in TestDSMRegistryMain (main_arg=0) at
test_dsm_registry.c:187
#12 0x000055a7514db00c in BackgroundWorkerMain
(startup_data=0x55a752dd8028, startup_data_len=1472)
at bgworker.c:846
#13 0x000055a7514de1e8 in postmaster_child_launch (child_type=B_BG_WORKER,
child_slot=239,
startup_data=0x55a752dd8028, startup_data_len=1472, client_sock=0x0) at
launch_backend.c:268
#14 0x000055a7514e530d in StartBackgroundWorker (rw=0x55a752dd8028) at
postmaster.c:4168
#15 0x000055a7514e55a4 in maybe_start_bgworkers () at postmaster.c:4334
#16 0x000055a7514e4200 in LaunchMissingBackgroundProcesses () at
postmaster.c:3408
#17 0x000055a7514e205b in ServerLoop () at postmaster.c:1728
#18 0x000055a7514e18b0 in PostmasterMain (argc=3, argv=0x55a752dd0e70) at
postmaster.c:1403
#19 0x000055a75138eead in main (argc=3, argv=0x55a752dd0e70) at main.c:231
```
Thank you,
Rahila Syed
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Reproducer-segmentation-fault-dshash.patch | application/octet-stream | 2.0 KB |
| 0001-Fix-the-seg-fault-during-proc-exit.patch | application/octet-stream | 1.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2025-11-21 11:53:52 | Re: Add os_page_num to pg_buffercache |
| Previous Message | Amul Sul | 2025-11-21 11:44:26 | Re: pg_waldump: support decoding of WAL inside tarfile |