Quick Links

[PATCH] Fix ProcKill lock-group vs procLatch recycle race

From:	Vlad Lesin <vladlesin(at)gmail(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	[PATCH] Fix ProcKill lock-group vs procLatch recycle race
Date:	2026-04-27 08:14:59
Message-ID:	d2983796-2603-41b7-a66e-fc8489ddb954@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello all,

The following are the patches and demonstration material for a
concurrency bug in ProcKill() where a lock-group leader and a member can
exit in parallel.

------------------------------------------------------------------------
Problem
------------------------------------------------------------------------

If a leader detaches from the lock group under leader_lwlock but
has not yet reached DisownLatch(&MyProc->procLatch), a concurrent
last follower can still put the *leader* PGPROC on a free list, or
the leader and the follower can make inconsistent decisions about
*who* returns which PGPROC, so that a slot is linked into the free
list with procLatch still owned, or is pushed twice. A new backend
that recycles the slot can then hit:

PANIC: latch already owned by PID ...

A concrete interleaving (lock group leader vs last member)
is the following(PG15 code).

=== Lock group leader ===
L1: LWLockAcquire(leader_lwlock)
L2: dlist_delete(&MyProc->lockGroupLink)
The list contains follower.
L3: dlist_is_empty → false (the follower still in)
L4: else if (leader != MyProc) → false, do nothing
L5: LWLockRelease(leader_lwlock)
*lwlock released*

>>>WINDOW OPEN HERE<<<

L6: SwitchBackToLocalLatch()
L7: pgstat_reset_wait_event_storage()
L8: proc = MyProc; MyProc = NULL;
L9: DisownLatch(&proc->procLatch)
*only here owner_pid = 0*
L10: SpinLockAcquire(ProcStructLock)
L11: if (proc->lockGroupLeader == NULL) → false, skip
L12: SpinLockRelease(ProcStructLock)
===========================

=== What the last lock group member does in that WINDOW ===
M1: LWLockAcquire(leader_lwlock)
Can proceed as soon as L5 releases it
M2: dlist_delete(&MyProc->lockGroupLink)
The list becomes EMPTY
M3: dlist_is_empty → true
M4: leader->lockGroupLeader = NULL
Flip leader's "don't free me" flag
M5: leader != MyProc → true
M6: SpinLockAcquire(ProcStructLock)
M7: push leader's PGPROC onto freelist
!!! while leader's latch is still owned !!!
M8: SpinLockRelease(ProcStructLock)
M9: LWLockRelease(leader_lwlock)
===========================================================

At M7, the leader's PGPROC is back on freeProcs with
procLatch.owner_pid == leader_pid (non-zero). The leader is still
sitting between L5 and L9.

=== The new backend picks it up in InitProcess() ===
B1: SpinLockAcquire(ProcStructLock)
B2: MyProc = *procgloballist
pops the leader's PGPROC
B3: *procgloballist = MyProc->links.next
B4: SpinLockRelease(ProcStructLock)
B5: ...
B6: OwnLatch(&MyProc->procLatch)
→ owner_pid != 0
→ elog(PANIC, "latch already owned by PID %d", owner_pid);
=====================================================

Note that all PG versions since 9.6 are affected.

The fix is to run SwitchBackToLocalLatch(),
pgstat_reset_wait_event_storage() and DisownLatch() before the
lock-group block, and to decide under leader_lwlock
(push_leader / push_self) with a single freelist update section at
the end. This matches what we implemented; detailed commentary
is in the commit messages and in the patch.

Mainline parallel query usually avoids the race: the leader is not
expected to reach ProcKill with another group member still in play
the way some extension-driven workloads can be.

------------------------------------------------------------------------
Testing
------------------------------------------------------------------------

Technically, a regression test should be included. However, developing
one is non-trivial given the current PG18 injection point
implementation. Because ProcKill() partially de-initializes data
necessary for injection points to function, a significant workaround
would be required. Given the complexity of such a workaround, I would
like to discuss whether a test is mandatory for this patch.

I have provided versions without the test (0001, 0002) as well as a test
that uses an injection point workaround to reproduce the bug
deterministically(0003). If a test for the bug fix is required, please
note that I can only provide it for PG17+, as earlier versions do not
support injection points.

------------------------------------------------------------------------
Attached patches
------------------------------------------------------------------------
1) 0001-ProcKill-REL15-lockgroup-freelist-race.patch
Core proc.c for REL_15_STABLE.
No TAP: PG15 has no suitable injection support for a core
reproducer; fix-only for that line.
2) 0002-ProcKill-PG18-core-lockgroup-freelist-race.patch
Same *core* fix for PG18: proc.c only.
3) 0003-PG18-unfixed-repro-tap-injection-harness.patch
*Demonstration only* Shows the PANIC; not the shape to land after
the fix without replacing the test.

--
Best regards,
Vlad

Attachment	Content-Type	Size
0001-ProcKill-REL15-lockgroup-freelist-race.patch	text/x-patch	8.0 KB
0002-ProcKill-PG18-core-lockgroup-freelist-race.patch	text/x-patch	7.1 KB
0003-PG18-unfixed-repro-tap-injection-harness.patch	text/x-patch	33.0 KB

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrei Lepikhov	2026-04-27 08:19:46	Re: A very quick observation of dangling pointers in Postgres pathlists
Previous Message	Jakub Wartak	2026-04-27 08:14:50	Re: question on visibility map