Re: Deadlock detector fails to activate on a hot standby replica

From: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Deadlock detector fails to activate on a hot standby replica
Date: 2026-07-01 15:57:58
Message-ID: 5c50d50c-5a9c-4a9e-9592-183cf4de1c80@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Xuneng Zhou

Thank you very much for such a comprehensive feedback.

On 6/30/26 06:14, Xuneng Zhou wrote:

3)

> Another potential issue of doing safe checks here is the order of
> assert and error out:
> In assert build, if the condition is true, then the header lock and
> content lock would not be released properly. So we may need to adjust
> the order.

> This leads me to wonder -- would it be better to let this help
> function to do less, like just for registration and let the callers to
> handle others.

I agree with your comment. Furthermore, I do not like the approach with
locking and unlocking in different functions. But, I see this approach
is used in some other places in the code. I should think how to improve it.

I have some doubts about the code where LockBufferForCleanup (bufmgr.c)
calls ResolveRecoveryConflictWithBufferPin (standby.c), but the latter
function calls BufferIsReadyForCleanup (bufmgr.c). There is an idea to
refactor it in the future, because these functions are closely coupled.
May be unite them or move ResolveRecoveryConflictWithBufferPin into
bufmgr.c...

4) the already-expired standby-delay path may not wake new pin holders

Agree, thank you. It seems it is a kind of an optimization to avoid
activation of timeouts. Is it enought to return after
SendRecoveryConflictWithBufferPin, once we sent a signal to a conflicting
backend? Not sure, we have to wait with ProcWaitForSignal. Of course, the
same check should be in the loop, when timeouts are already activated. If
another backend pins the buffer after the broadcast, it will be handled
in the parent function.

5) The name of BufferIsReadyForCleanup seems broader than the actual protocol.

What about to move RegisterPinCountWaiter out of this function? May be, rename
it into BufferGetRefCount() or something else like it? In this case, its
semantics will be simple and well-defined, and it can be re-used in some other
places. Another thing is to add a secong function argument like
doRegisterWaiter.

We may also remove BufferIsReadyForCleanup but put its code into
ResolveRecoveryConflictWithBufferPin.

I have no good solution right now. Let me please think about it.

6)

> From personal experience, the message seems a bit hard to follow if
> one does not read the related discussion. I like your analysis of the
> problem at the start of this thread [1]. Would it be sensible to
> summarize and reuse some part of it for the message?

> The motivation of this patch is to fix deadlock detector activation,
> but the wait-loop change has a slightly broader benefit. It also
> improves the standby timeout path indirectly by avoiding needless
> timeout teardown and re-entry on unrelated wakeups, and by keeping any
> already-fired timeout state until the loop handles it. Should we
> mention this as an additional benefit, or even frame the message
> broader like:
> Fix the buffer-pin recovery-conflict wait loop.

Thank you for such a detailed comment. I would like to propose the following
commit message:

Fix recovery-conflict wait loop for buffer pins on hot standby

When a deadlock occurs between the startup process and a backend during
recovery on a hot standby, the deadlock detector may fail to activate if
deadlock_timeout exceeds log_startup_progress_interval. Due to an
optimization in timeout.c that avoids redundant setitimer() calls,
periodic SIGALRM signals from the progress timer can arrive before the
deadlock timeout expires. These signals wake the startup process
prematurely, causing ResolveRecoveryConflictWithBufferPin to reset all
timeouts and retry without ever triggering the deadlock detector. This
results in an infinite loop in LockBufferForCleanup and stalls recovery
indefinitely, particularly when max_standby_streaming_delay = -1.

The timeout.c subsystem uses an optimization that avoids redundant
setitimer() calls when the nearest active timeout's expiration time is
later than the currently armed timer. As a result, a SIGALRM from a
previously set timer (e.g., log_startup_progress_interval) can fire when
no currently active timeout has actually been reached. This periodic
signal wakes ResolveRecoveryConflictWithBufferPin prematurely, resetting
the timeout loop without ever triggering the deadlock detector at
deadlock_timeout. The deadlock timeout is perpetually re-armed but never
fires because each periodic SIGALRM arrives before it expires.

This patch fixes the issue by making ResolveRecoveryConflictWithBufferPin
ignore unintended SIGALRM signals that do not correspond to any currently
enabled timeout, ensuring the deadlock timeout fires correctly and the
conflicting backend is signaled to check for deadlocks.

It improves the standby timeout path by reducing unnecessary timeout
teardown and re-arming on unrelated wakeups, preserving the activated
timer state until the wait loop completes. This makes the deadlock
timeout behavior consistent with user expectations.

It seems, the patch should be completely rewritten. I will try to
prepare a new patch version.

With best regards,
Vitaly

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Tom Lane 2026-07-01 15:55:48 Re: Coverage (lcov) failing with inconsistent error in versions 2.x