| From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
|---|---|
| To: | Vlad Lesin <vladlesin(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race |
| Date: | 2026-05-06 09:51:00 |
| Message-ID: | 83008704-B02E-4863-B989-9A3C666DEB13@yandex-team.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On 5 May 2026, at 21:31, Vlad Lesin <vladlesin(at)gmail(dot)com> wrote:
>
> It might make sense to commit the test.
Let's try to work on it, maybe we can bring it to good shape to consider for
a commit.
cc to Michael:
prockill_race needs to build the same InjectionPointCondition payload that
injection_wait consumes to know which PID to block. The struct is currently
private to injection_points.c, so the patch extracts it into a small header
that prockill_race.c includes via a relative "../injection_points/" path.
That works but feels non-idiomatic. Since injection_points grows organically
to support new bug reproducers anyway, making the condition type part of its
public header seems like a natural fit - but we are not sure the fix is
committable as-is, so we wanted to ask before doing any more cleanup: is
this refactor acceptable at all, and if so, would you prefer a proper
installed header (as contrib/pg_plan_advice does) over the relative include?
(attached step 0001, other steps are not about injection points infrastructure)
To Vlad:
I simplified the test as much as I could. The hand-rolled polling loop is
replaced with poll_query_until, big inline comments explaining the
controller-session trick and the PGPROC-scan rationale are moved into the
C helper functions where the mechanism lives, and outcome classification is
two plain ok() calls.
The fix is intact. I did not find a way to make it simpler - early
DisownLatch, decisions under leader_lwlock, deferred pushes under
freeProcsLock, and the leader-skips-self-push case each address a distinct
invariant the bug violated. What I have not fully reasoned through is
whether the new ordering affects any surrounding invariants I might have
missed, so a second further review of a bug fix would be good.
Also maybe consider alternative possible names to prockill_race that would be
idiomatic to stuff in test modules... It's kind of not relevant to current stage of the
patch, but anyway.
Thank you!
Best regards, Andrey Borodin.
| Attachment | Content-Type | Size |
|---|---|---|
| vAB2-0001-Extract-InjectionPointCondition-to-injection_po.patch | application/octet-stream | 3.3 KB |
| vAB2-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch | application/octet-stream | 7.8 KB |
| vAB2-0002-Add-regression-test-for-ProcKill-lock-group-pro.patch | application/octet-stream | 15.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ayush Tiwari | 2026-05-06 09:55:35 | Re: [PATCH] Fix WAIT FOR LSN cleanup on subtransaction abort |
| Previous Message | Postgres Cybrosys | 2026-05-06 09:47:52 | [PATCH] Make spelling consistent in waiteventset.c |