| From: | Ilmar Yunusov <tanswis42(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Cc: | JoongHyuk Shin <sjh910805(at)gmail(dot)com> |
| Subject: | Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits |
| Date: | 2026-06-03 10:22:19 |
| Message-ID: | 178048213946.1017.10427695804113210415.pgcf@coridan.postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
Hi,
I looked at v3 again, this time on Linux, focusing on the repeated SIGUSR1
behavior and the log_recovery_conflict_waits timing issue I reported for v2.
I used the v3 attachment from JoongHyuk's 2026-06-03 message, on
origin/master at f2081a7800f1696cb0415bacd655cb41b7b9ca63.
The patch applies cleanly with git am, and git diff --check reports no
issues.
I built with:
./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu --enable-tap-tests
make -s -j3
make -s install
That passed.
The new targeted TAP test passed:
make -C src/test/recovery check PROVE_TESTS=t/054_bufferpin_conflict_log_timing.pl
Result:
t/054_bufferpin_conflict_log_timing.pl .. ok
All tests successful.
Files=1, Tests=3
Result: PASS
I also ran the full recovery TAP suite:
make -C src/test/recovery check
That passed too:
All tests successful.
Files=53, Tests=633
Result: PASS
Six tests were skipped because injection points were not supported by this
build.
For the signal behavior, I ran the same buffer-pin conflict reproducer under
strace on the standby postmaster and its children:
strace -ff -qq -e trace=kill,tgkill,tkill
The count below is for kill/tgkill/tkill(..., SIGUSR1) syscalls during the
conflict window, after subtracting signals already seen before VACUUM FREEZE.
On unpatched master:
sigusr1_delta=51
recovery still waiting after 100.442 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.455 ms: recovery conflict on buffer pin
With v3:
sigusr1_delta=2
recovery still waiting after 100.479 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.778 ms: recovery conflict on buffer pin
I interpret the two v3 SIGUSR1 syscalls as the one deadlock-check signal and
the final cancellation signal at max_standby_streaming_delay. So in this
repro, v3 removes the repeated deadlock-check signals every deadlock_timeout,
while keeping the "recovery still waiting" log near deadlock_timeout.
I did not find a new issue in the checked path.
I have not reviewed the backpatching question, and I did not run
installcheck-world.
Regards,
Ilmar Yunusov
The new status of this patch is: Ready for Committer
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrey Silitskiy | 2026-06-03 10:32:30 | Re: Exit walsender before confirming remote flush in logical replication |
| Previous Message | Tomas Vondra | 2026-06-03 10:07:40 | Re: hashjoins vs. Bloom filters (yet again) |