Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits

From: Ilmar Yunusov <tanswis42(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: JoongHyuk Shin <sjh910805(at)gmail(dot)com>
Subject: Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
Date: 2026-06-03 10:22:19
Message-ID: 178048213946.1017.10427695804113210415.pgcf@coridan.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Hi,

I looked at v3 again, this time on Linux, focusing on the repeated SIGUSR1
behavior and the log_recovery_conflict_waits timing issue I reported for v2.

I used the v3 attachment from JoongHyuk's 2026-06-03 message, on
origin/master at f2081a7800f1696cb0415bacd655cb41b7b9ca63.

The patch applies cleanly with git am, and git diff --check reports no
issues.

I built with:

./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu --enable-tap-tests
make -s -j3
make -s install

That passed.

The new targeted TAP test passed:

make -C src/test/recovery check PROVE_TESTS=t/054_bufferpin_conflict_log_timing.pl

Result:

t/054_bufferpin_conflict_log_timing.pl .. ok
All tests successful.
Files=1, Tests=3
Result: PASS

I also ran the full recovery TAP suite:

make -C src/test/recovery check

That passed too:

All tests successful.
Files=53, Tests=633
Result: PASS

Six tests were skipped because injection points were not supported by this
build.

For the signal behavior, I ran the same buffer-pin conflict reproducer under
strace on the standby postmaster and its children:

strace -ff -qq -e trace=kill,tgkill,tkill

The count below is for kill/tgkill/tkill(..., SIGUSR1) syscalls during the
conflict window, after subtracting signals already seen before VACUUM FREEZE.

On unpatched master:

sigusr1_delta=51
recovery still waiting after 100.442 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.455 ms: recovery conflict on buffer pin

With v3:

sigusr1_delta=2
recovery still waiting after 100.479 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.778 ms: recovery conflict on buffer pin

I interpret the two v3 SIGUSR1 syscalls as the one deadlock-check signal and
the final cancellation signal at max_standby_streaming_delay. So in this
repro, v3 removes the repeated deadlock-check signals every deadlock_timeout,
while keeping the "recovery still waiting" log near deadlock_timeout.

I did not find a new issue in the checked path.

I have not reviewed the backpatching question, and I did not run
installcheck-world.

Regards,
Ilmar Yunusov

The new status of this patch is: Ready for Committer

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Silitskiy 2026-06-03 10:32:30 Re: Exit walsender before confirming remote flush in logical replication
Previous Message Tomas Vondra 2026-06-03 10:07:40 Re: hashjoins vs. Bloom filters (yet again)