Quick Links

Re: Deadlock detector fails to activate on a hot standby replica

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Deadlock detector fails to activate on a hot standby replica
Date:	2026-05-20 16:26:53
Message-ID:	CAHGQGwExVv1-mtPEX-JFGBXNY2r8DfP+sQVPD8GpDTefW0KdsQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jan 23, 2026 at 8:52 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
>
> Dear Hackers,
>
> I would like to propose a patch that fixes the problem, which has the roots in
> the possibility of spontaneous SIGALRM signals when waiting for some timeouts.
> The idea of the patch - ignore spontaneous SIGALRM signals and continue waiting
> for expected timeouts or buffer unpinning by the conflicting backend. This
> patch is not a final version. I plan to add a tap-test for this case.

Thanks for the patch!

#include "storage/sinvaladt.h"
#include "storage/standby.h"
+#include "storage/buf_internals.h"

This include should be placed in alphabetical order.

+ * We assume that only UnpinBuffer() and the timeout requests established
+ * above can wake us up here. WakeupRecovery() called by walreceiver or
+ * SIGHUP signal handler, etc cannot do that because it uses the different
+ * latch from that ProcWaitForSignal() waits on.

As your investigation showed, that assumption does not seem to hold. If so,
I think something like the following would be more accurate:

---------------------------------
ProcWaitForSignal() can also wake up for unrelated reasons, so recheck
whether we're still the waiter after each wakeup. If we are and no timeout
fired, continue waiting without resetting the active timeouts.
---------------------------------

+ uint32 buf_refcount = BUF_STATE_GET_REFCOUNT(buf_state);
<snip>
+ if (buf_refcount > 1)
+ continue;

Wouldn't it be better to check explicitly whether we're still the waiter,
instead of using BUF_STATE_GET_REFCOUNT()?

(buf_state & BM_PIN_COUNT_WAITER) != 0 &&
bufHdr->wait_backend_pgprocno == MyProcNumber

The current control flow in the loop feels a bit hard to follow.
Would something like the following be simpler?

for (;;)
{
....
ProcWaitForSignal(...);

if (!StillWaitingForBufferPin(...))
break;

if (got_standby_delay_timeout)
{
SendRecoveryConflictWithBufferPin(...);
break;
}
else if (got_standby_deadlock_timeout)
{
SendRecoveryConflictWithBufferPin(...);
break;
}
}

Regards,

--
Fujii Masao

In response to

Re: Deadlock detector fails to activate on a hot standby replica at 2026-01-23 11:51:49 from Vitaly Davydov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Arseniy Mukhin	2026-05-20 17:27:51	Re: [PATCH] Fix LISTEN startup race with direct advancement
Previous Message	Lucas Jeffrey	2026-05-20 13:14:36	[PATCH] Add reentrancy guards in ri_triggers.c