| From: | Vlad Lesin <vladlesin(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | [PATCH] Fix ProcKill lock-group vs procLatch recycle race |
| Date: | 2026-04-27 08:14:59 |
| Message-ID: | d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello all,
The following are the patches and demonstration material for a
concurrency bug in ProcKill() where a lock-group leader and a member can
exit in parallel.
------------------------------------------------------------------------
Problem
------------------------------------------------------------------------
If a leader detaches from the lock group under leader_lwlock but
has not yet reached DisownLatch(&MyProc->procLatch), a concurrent
last follower can still put the *leader* PGPROC on a free list, or
the leader and the follower can make inconsistent decisions about
*who* returns which PGPROC, so that a slot is linked into the free
list with procLatch still owned, or is pushed twice. A new backend
that recycles the slot can then hit:
PANIC: latch already owned by PID ...
A concrete interleaving (lock group leader vs last member)
is the following(PG15 code).
=== Lock group leader ===
L1: LWLockAcquire(leader_lwlock)
L2: dlist_delete(&MyProc->lockGroupLink)
The list contains follower.
L3: dlist_is_empty → false (the follower still in)
L4: else if (leader != MyProc) → false, do nothing
L5: LWLockRelease(leader_lwlock)
*lwlock released*
>>>WINDOW OPEN HERE<<<
L6: SwitchBackToLocalLatch()
L7: pgstat_reset_wait_event_storage()
L8: proc = MyProc; MyProc = NULL;
L9: DisownLatch(&proc->procLatch)
*only here owner_pid = 0*
L10: SpinLockAcquire(ProcStructLock)
L11: if (proc->lockGroupLeader == NULL) → false, skip
L12: SpinLockRelease(ProcStructLock)
===========================
=== What the last lock group member does in that WINDOW ===
M1: LWLockAcquire(leader_lwlock)
Can proceed as soon as L5 releases it
M2: dlist_delete(&MyProc->lockGroupLink)
The list becomes EMPTY
M3: dlist_is_empty → true
M4: leader->lockGroupLeader = NULL
Flip leader's "don't free me" flag
M5: leader != MyProc → true
M6: SpinLockAcquire(ProcStructLock)
M7: push leader's PGPROC onto freelist
!!! while leader's latch is still owned !!!
M8: SpinLockRelease(ProcStructLock)
M9: LWLockRelease(leader_lwlock)
===========================================================
At M7, the leader's PGPROC is back on freeProcs with
procLatch.owner_pid == leader_pid (non-zero). The leader is still
sitting between L5 and L9.
=== The new backend picks it up in InitProcess() ===
B1: SpinLockAcquire(ProcStructLock)
B2: MyProc = *procgloballist
pops the leader's PGPROC
B3: *procgloballist = MyProc->links.next
B4: SpinLockRelease(ProcStructLock)
B5: ...
B6: OwnLatch(&MyProc->procLatch)
→ owner_pid != 0
→ elog(PANIC, "latch already owned by PID %d", owner_pid);
=====================================================
Note that all PG versions since 9.6 are affected.
The fix is to run SwitchBackToLocalLatch(),
pgstat_reset_wait_event_storage() and DisownLatch() before the
lock-group block, and to decide under leader_lwlock
(push_leader / push_self) with a single freelist update section at
the end. This matches what we implemented; detailed commentary
is in the commit messages and in the patch.
Mainline parallel query usually avoids the race: the leader is not
expected to reach ProcKill with another group member still in play
the way some extension-driven workloads can be.
------------------------------------------------------------------------
Testing
------------------------------------------------------------------------
Technically, a regression test should be included. However, developing
one is non-trivial given the current PG18 injection point
implementation. Because ProcKill() partially de-initializes data
necessary for injection points to function, a significant workaround
would be required. Given the complexity of such a workaround, I would
like to discuss whether a test is mandatory for this patch.
I have provided versions without the test (0001, 0002) as well as a test
that uses an injection point workaround to reproduce the bug
deterministically(0003). If a test for the bug fix is required, please
note that I can only provide it for PG17+, as earlier versions do not
support injection points.
------------------------------------------------------------------------
Attached patches
------------------------------------------------------------------------
1) 0001-ProcKill-REL15-lockgroup-freelist-race.patch
Core proc.c for REL_15_STABLE.
No TAP: PG15 has no suitable injection support for a core
reproducer; fix-only for that line.
2) 0002-ProcKill-PG18-core-lockgroup-freelist-race.patch
Same *core* fix for PG18: proc.c only.
3) 0003-PG18-unfixed-repro-tap-injection-harness.patch
*Demonstration only* Shows the PANIC; not the shape to land after
the fix without replacing the test.
--
Best regards,
Vlad
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-ProcKill-REL15-lockgroup-freelist-race.patch | text/x-patch | 8.0 KB |
| 0002-ProcKill-PG18-core-lockgroup-freelist-race.patch | text/x-patch | 7.1 KB |
| 0003-PG18-unfixed-repro-tap-injection-harness.patch | text/x-patch | 33.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrei Lepikhov | 2026-04-27 08:19:46 | Re: A very quick observation of dangling pointers in Postgres pathlists |
| Previous Message | Jakub Wartak | 2026-04-27 08:14:50 | Re: question on visibility map |