| From: | Vlad Lesin <vladlesin(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race |
| Date: | 2026-05-15 12:29:30 |
| Message-ID: | d0b9baa2-7445-40f6-b994-c36125e70b0d@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello Michael, thank you for your review.
I am sending a new series of patches that incorporate your code review
notes. The version prefix is "vVL2-" to distinguish Andrey's patches
from mine.
On 5/11/26 04:58, Michael Paquier wrote:
> On Thu, May 07, 2026 at 04:57:28PM +0500, Andrey Borodin wrote:
>> Done so. I still keep Vlad's injection_point_condition.h though.
>> It seems useful. But no more Meson\Makefile changes, and new sql stuff lives
>> now near removable_cutoff() SQL.
>
> Because you want to have your callbacks in a new regress_prockill.c
> while being able to reuse the structures. I don't see why not on
> clarity ground. The creation of injection_point_condition.h could be
> done in its own commit, independent of the rest, but that's just me
> being pedantic about such matters. I think that this should be
> backpatched anyway, that could make the introduction of new tests
> easier.
I moved the creation of injection_point_condition.h into a separate
commit in 0001 patch. Additionally, I noticed that if several processes
are waiting on the same injection point, only one of them will be
awakened by a single injection_points_wakeup() call. I am not sure if
this behavior is intentional; please let me know.
For context, MySQL has a similar feature called debug sync points [0],
which allows waking up several threads waiting on them. I suppose this
same semantics would be useful for injection points as well to simplify
tests. I implemented this behavior in the 0002 patch and utilized it in
0003. If you disagree with this approach, I can rewrite the test to use
two separate injection points for the lock group leader and follower.
> +# Two backends form a lock group, then are terminated concurrently. The test
> +# uses injection points placed inside ProcKill() to pause both victims there
> +# and verify that a freshly forked backend can claim the recycled PGPROC slot
> +# without hitting "latch already owned by PID ..." PANIC.
>
> This claim is slightly incorrect here. If I undo the fix, place the
> two INJECTION_POINT macros and re-run the test, the test does not
> produce a PANIC. In order to reproduce a PANIC, it seems that we
> would need something more complicated with an extra point after a
> DisownLatch(). That is not required to me, still this is misleading
> and could be improved, I guess..
Yes, you are right, I fixed the race in 0003 patch with additional
injection point.
> I think that we should refactor the patch into two pieces,
> for clarity:
> - Patch 1 refactors the freelist manipulations and introduces the two
> new boolean flags, documenting more precily how the freelists are
> manipulated and why things are done this way (leader, self, etc.).
> - Patch 2, which is about the earlier DisownLatch(), then becomes a
> fix of its own. That's the primary fix we care about, to avoid the
> entries to be recycled due to a too-early PGPROC entry marked as
> available.
Done. I have split this into two separate patches, 0004 and 0005. The
test in 0003 fails on patch 0004 and passes on 0005.
> Also, I suspect that the test is racy? For one, I strongly suspect
> that you are going to need tweaks similar 011_lock_stats.pl in
> test_misc where we rely on some query_until() with psql banners, to
> make sure that the waits actually happen.. I got the point about
> pg_stat_activity not being something we can rely on here. That's
> something that would be easier to see on an overloaded machine.
The test should no longer be racy following the changes in 0003.
--
Best regards,
Vlad
[0] https://dev.mysql.com/doc/dev/mysql-server/9.6.0/PAGE_DEBUG_SYNC.html
| Attachment | Content-Type | Size |
|---|---|---|
| vVL2-0001-injection_points-extract-condition-header.patch | text/x-patch | 4.1 KB |
| vVL2-0002-injection_points-wake-every-matching-waiter.patch | text/x-patch | 3.1 KB |
| vVL2-0003-add-prockill-lockgroup-regression-test.patch | text/x-patch | 18.8 KB |
| vVL2-0004-fix-lockgroup-double-push-and-leak.patch | text/x-patch | 8.1 KB |
| vVL2-0005-fix-prockill-lockgroup-procLatch-recycle-race.patch | text/x-patch | 4.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Xuneng Zhou | 2026-05-15 12:53:50 | Re: Bound memory usage during manual slot sync retries |
| Previous Message | Zsolt Parragi | 2026-05-15 11:56:46 | Re: COPY ON_CONFLICT TABLE; save duplicated record to another table. |