Re: Deadlock detector fails to activate on a hot standby replica

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Deadlock detector fails to activate on a hot standby replica
Date: 2026-06-18 02:48:33
Message-ID: CAHGQGwF01sOHnh6_9yDWiAdBMNnGWsTpM89+8XeztMjV=xk+cw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 12, 2026 at 6:23 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Fri, Jun 12, 2026 at 3:00 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
> >
> > Dear Fujii Masao,
> >
> > On 6/10/26 11:42, Fujii Masao wrote:
> > > Could you review the v4 patches?
> >
> > I've got the issue with BM_PIN_COUNT_WAITER. The origin of the issue is that
> > the other backend resets the flag before sending the notification (signal)
> > to the waiter. I agree with your changes where this flag is set again.
> >
> > I would like to propose BufferIsReadyForCleanup function name instead of
> > BufferShouldWaitForCleanupLockForStandby (and change the returning value
> > semantics). There are other possible alternatives: BufferIsReadyForWriting,
> > BufferIsReadyForExclusiveAccess or something else. The code will look like
> > below:
> >
> > if (BufferIsReadyForCleanup(bufid + 1))
> > break;
>
> Okay.

I've updated the patch accordingly.

> Also, the current test assumes that
> log_startup_progress_interval is effective during standby WAL replay.
> But as discussed in [1], that assumption may actually be wrong. It
> seems to work today, but that could just be accidental and might
> change in the future. So I think we should consider a testing approach
> that does not depend on that assumption.
>
> At the moment, the only alternative I can think of is to send SIGALRM
> periodically to the startup process directly from the test, though I'm
> not sure that's a good approach either...

I still haven't come up with a good way to test this without depending on
log_startup_progress_interval being active during standby WAL replay.
Even so, I think the fix is still worth committing because it addresses
a real issue. So how about committing 0001 patch first, even without
a regression test?

Regards,

--
Fujii Masao

Attachment Content-Type Size
v5-0001-Fix-deadlock-detector-activation-in-a-recovery-co.patch application/octet-stream 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ayush Tiwari 2026-06-18 02:53:51 [PATCH] Fix two incorrect code comments
Previous Message Tatsuo Ishii 2026-06-18 02:44:40 Re: IGNORE/RESPECT NULLS can be specified for (prokind == 'f').