Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits

From: Ilmar Y <tanswis42(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: JoongHyuk Shin <sjh910805(at)gmail(dot)com>
Subject: Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
Date: 2026-05-30 11:51:30
Message-ID: 178014189038.1017.17670191663605578870.pgcf@coridan.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, failed
Spec compliant: not tested
Documentation: not tested

Hi,

I looked at v2, focusing on the latest question about
log_recovery_conflict_waits behavior.

The patch applies cleanly on current master at
db5ed03217b9c238703df8b4b286115d6e940488. git diff --check reports no
issues. I built both master and v2 locally with:

./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
make -s -j8
make -s install

Both builds passed.

I tried to check the case Zsolt described. In my local repro, the final
cancellation still happens at about max_standby_streaming_delay in both
master and v2. The visible difference is that v2 delays the first
log_recovery_conflict_waits "recovery still waiting" message.

I used a small primary/standby setup based on the buffer-pin conflict pattern
from src/test/recovery/t/031_recovery_conflict.pl:

log_recovery_conflict_waits = on
deadlock_timeout = 100ms
max_standby_streaming_delay = 5s

The scenario is:

1. Create a small table.
2. Generate an aborted tuple on the primary and replay it on the standby.
3. Open a cursor on the standby and fetch one row, leaving the transaction
idle with the cursor open.
4. Run VACUUM FREEZE on the primary.

On current master, the standby log shows:

recovery still waiting after 101.060 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.324 ms: recovery conflict on buffer pin

With v2, the standby log shows:

recovery still waiting after 5001.064 ms: recovery conflict on buffer pin
terminating connection due to conflict with recovery
recovery finished waiting after 5001.535 ms: recovery conflict on buffer pin

So in this repro the early "still waiting" log at deadlock_timeout is lost,
and the first waiting log is emitted only when the wait is already being
resolved at about max_standby_streaming_delay.

This seems to come from the new second ProcWaitForSignal() inside
ResolveRecoveryConflictWithBufferPin(). After the deadlock timeout fires, v2
sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and waits again internally. For a
non-deadlock buffer pin holder, LockBufferForCleanup() does not get control
back after deadlock_timeout, so it cannot emit the early recovery conflict
log message.

This looks similar to the reason ResolveRecoveryConflictWithLock() has the
logging_conflict argument: if the conflict still needs to be logged, it returns
after sending the deadlock-check signal, and only avoids the repeated wait on
the later call after the conflict has been logged.

Should ResolveRecoveryConflictWithBufferPin() preserve the same behavior, for
example by skipping the new second wait until LockBufferForCleanup() has had a
chance to emit the first log_recovery_conflict_waits message?

I have not checked the repeated SIGUSR1 count with strace in my local macOS
environment, and I have not reviewed the backpatching question.

Regards,
Ilmar Yunusov

The new status of this patch is: Waiting on Author

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilmar Y 2026-05-30 12:14:33 Re: [PATCH] Add pg_get_policy_ddl() function to reconstruct CREATE POLICY statement
Previous Message Ilmar Y 2026-05-30 09:44:08 Re: RFC: Allow EXPLAIN to Output Page Fault Information