| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Deadlock detector fails to activate on a hot standby replica |
| Date: | 2026-05-20 16:26:53 |
| Message-ID: | CAHGQGwExVv1-mtPEX-JFGBXNY2r8DfP+sQVPD8GpDTefW0KdsQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Jan 23, 2026 at 8:52 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
>
> Dear Hackers,
>
> I would like to propose a patch that fixes the problem, which has the roots in
> the possibility of spontaneous SIGALRM signals when waiting for some timeouts.
> The idea of the patch - ignore spontaneous SIGALRM signals and continue waiting
> for expected timeouts or buffer unpinning by the conflicting backend. This
> patch is not a final version. I plan to add a tap-test for this case.
Thanks for the patch!
#include "storage/sinvaladt.h"
#include "storage/standby.h"
+#include "storage/buf_internals.h"
This include should be placed in alphabetical order.
+ * We assume that only UnpinBuffer() and the timeout requests established
+ * above can wake us up here. WakeupRecovery() called by walreceiver or
+ * SIGHUP signal handler, etc cannot do that because it uses the different
+ * latch from that ProcWaitForSignal() waits on.
As your investigation showed, that assumption does not seem to hold. If so,
I think something like the following would be more accurate:
---------------------------------
ProcWaitForSignal() can also wake up for unrelated reasons, so recheck
whether we're still the waiter after each wakeup. If we are and no timeout
fired, continue waiting without resetting the active timeouts.
---------------------------------
+ uint32 buf_refcount = BUF_STATE_GET_REFCOUNT(buf_state);
<snip>
+ if (buf_refcount > 1)
+ continue;
Wouldn't it be better to check explicitly whether we're still the waiter,
instead of using BUF_STATE_GET_REFCOUNT()?
(buf_state & BM_PIN_COUNT_WAITER) != 0 &&
bufHdr->wait_backend_pgprocno == MyProcNumber
The current control flow in the loop feels a bit hard to follow.
Would something like the following be simpler?
for (;;)
{
....
ProcWaitForSignal(...);
if (!StillWaitingForBufferPin(...))
break;
if (got_standby_delay_timeout)
{
SendRecoveryConflictWithBufferPin(...);
break;
}
else if (got_standby_deadlock_timeout)
{
SendRecoveryConflictWithBufferPin(...);
break;
}
}
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Arseniy Mukhin | 2026-05-20 17:27:51 | Re: [PATCH] Fix LISTEN startup race with direct advancement |
| Previous Message | Lucas Jeffrey | 2026-05-20 13:14:36 | [PATCH] Add reentrancy guards in ri_triggers.c |