Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Vlad Lesin <vladlesin(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race
Date: 2026-05-06 09:51:00
Message-ID: 83008704-B02E-4863-B989-9A3C666DEB13@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 5 May 2026, at 21:31, Vlad Lesin <vladlesin(at)gmail(dot)com> wrote:
>
> It might make sense to commit the test.

Let's try to work on it, maybe we can bring it to good shape to consider for
a commit.

cc to Michael:

prockill_race needs to build the same InjectionPointCondition payload that
injection_wait consumes to know which PID to block. The struct is currently
private to injection_points.c, so the patch extracts it into a small header
that prockill_race.c includes via a relative "../injection_points/" path.
That works but feels non-idiomatic. Since injection_points grows organically
to support new bug reproducers anyway, making the condition type part of its
public header seems like a natural fit - but we are not sure the fix is
committable as-is, so we wanted to ask before doing any more cleanup: is
this refactor acceptable at all, and if so, would you prefer a proper
installed header (as contrib/pg_plan_advice does) over the relative include?

(attached step 0001, other steps are not about injection points infrastructure)

To Vlad:

I simplified the test as much as I could. The hand-rolled polling loop is
replaced with poll_query_until, big inline comments explaining the
controller-session trick and the PGPROC-scan rationale are moved into the
C helper functions where the mechanism lives, and outcome classification is
two plain ok() calls.

The fix is intact. I did not find a way to make it simpler - early
DisownLatch, decisions under leader_lwlock, deferred pushes under
freeProcsLock, and the leader-skips-self-push case each address a distinct
invariant the bug violated. What I have not fully reasoned through is
whether the new ordering affects any surrounding invariants I might have
missed, so a second further review of a bug fix would be good.

Also maybe consider alternative possible names to prockill_race that would be
idiomatic to stuff in test modules... It's kind of not relevant to current stage of the
patch, but anyway.

Thank you!

Best regards, Andrey Borodin.

Attachment Content-Type Size
vAB2-0001-Extract-InjectionPointCondition-to-injection_po.patch application/octet-stream 3.3 KB
vAB2-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch application/octet-stream 7.8 KB
vAB2-0002-Add-regression-test-for-ProcKill-lock-group-pro.patch application/octet-stream 15.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ayush Tiwari 2026-05-06 09:55:35 Re: [PATCH] Fix WAIT FOR LSN cleanup on subtransaction abort
Previous Message Postgres Cybrosys 2026-05-06 09:47:52 [PATCH] Make spelling consistent in waiteventset.c