Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Ilmar Yunusov <tanswis42(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, JoongHyuk Shin <sjh910805(at)gmail(dot)com>
Subject: Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
Date: 2026-06-25 11:58:29
Message-ID: CABPTF7U0eLO2_7_j_=aTX7Nczs65ND=n9C3G5NRy1ODVvVCmfw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ilmar, JoongHyuk,

On Wed, Jun 3, 2026 at 6:23 PM Ilmar Yunusov <tanswis42(at)gmail(dot)com> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world: not tested
> Implements feature: tested, passed
> Spec compliant: not tested
> Documentation: not tested
>
> Hi,
>
> I looked at v3 again, this time on Linux, focusing on the repeated SIGUSR1
> behavior and the log_recovery_conflict_waits timing issue I reported for v2.
>
> I used the v3 attachment from JoongHyuk's 2026-06-03 message, on
> origin/master at f2081a7800f1696cb0415bacd655cb41b7b9ca63.
>
> The patch applies cleanly with git am, and git diff --check reports no
> issues.
>
> I built with:
>
> ./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu --enable-tap-tests
> make -s -j3
> make -s install
>
> That passed.
>
> The new targeted TAP test passed:
>
> make -C src/test/recovery check PROVE_TESTS=t/054_bufferpin_conflict_log_timing.pl
>
> Result:
>
> t/054_bufferpin_conflict_log_timing.pl .. ok
> All tests successful.
> Files=1, Tests=3
> Result: PASS
>
> I also ran the full recovery TAP suite:
>
> make -C src/test/recovery check
>
> That passed too:
>
> All tests successful.
> Files=53, Tests=633
> Result: PASS
>
> Six tests were skipped because injection points were not supported by this
> build.
>
> For the signal behavior, I ran the same buffer-pin conflict reproducer under
> strace on the standby postmaster and its children:
>
> strace -ff -qq -e trace=kill,tgkill,tkill
>
> The count below is for kill/tgkill/tkill(..., SIGUSR1) syscalls during the
> conflict window, after subtracting signals already seen before VACUUM FREEZE.
>
> On unpatched master:
>
> sigusr1_delta=51
> recovery still waiting after 100.442 ms: recovery conflict on buffer pin
> terminating connection due to conflict with recovery
> recovery finished waiting after 5001.455 ms: recovery conflict on buffer pin
>
> With v3:
>
> sigusr1_delta=2
> recovery still waiting after 100.479 ms: recovery conflict on buffer pin
> terminating connection due to conflict with recovery
> recovery finished waiting after 5001.778 ms: recovery conflict on buffer pin
>
> I interpret the two v3 SIGUSR1 syscalls as the one deadlock-check signal and
> the final cancellation signal at max_standby_streaming_delay. So in this
> repro, v3 removes the repeated deadlock-check signals every deadlock_timeout,
> while keeping the "recovery still waiting" log near deadlock_timeout.
>
> I did not find a new issue in the checked path.
>
> I have not reviewed the backpatching question, and I did not run
> installcheck-world.
>
> Regards,
> Ilmar Yunusov
>
> The new status of this patch is: Ready for Committer

While working on fixing [1], I noticed the same TODO comment about
ResolveRecoveryConflictWithBufferPin() in the nearby code. That led me
to this thread while I was doing some background research.

It seems to me that this thread is closely related to the bug in [2].
In the latest version, both patches modify the ownership of the same
wait loop, the timeout lifecycle, and waiter registration. Because of
that, I wonder whether the bug fix should be landed first, with this
patch rebased on top of it.

If that's the case, I think the current patch status should probably
be changed to Waiting on Author, so that it isn't picked up for commit
prematurely, though both patches are already on Fujii-san's radar.

[1] https://www.postgresql.org/message-id/flat/7685519a-0bf9-4e17-93ca-7e3aa10fa29c%40gmail.com
[2] https://www.postgresql.org/message-id/flat/44c24dcf-5710-410f-b1b6-d10b315f3d51%40postgrespro.ru

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-06-25 12:14:20 Re: Deadlock detector fails to activate on a hot standby replica
Previous Message Yingying Chen 2026-06-25 11:49:58 Fix doc about pg_get_multixact_stats()