| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Deadlock detector fails to activate on a hot standby replica |
| Date: | 2026-06-18 02:48:33 |
| Message-ID: | CAHGQGwF01sOHnh6_9yDWiAdBMNnGWsTpM89+8XeztMjV=xk+cw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Jun 12, 2026 at 6:23 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Fri, Jun 12, 2026 at 3:00 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
> >
> > Dear Fujii Masao,
> >
> > On 6/10/26 11:42, Fujii Masao wrote:
> > > Could you review the v4 patches?
> >
> > I've got the issue with BM_PIN_COUNT_WAITER. The origin of the issue is that
> > the other backend resets the flag before sending the notification (signal)
> > to the waiter. I agree with your changes where this flag is set again.
> >
> > I would like to propose BufferIsReadyForCleanup function name instead of
> > BufferShouldWaitForCleanupLockForStandby (and change the returning value
> > semantics). There are other possible alternatives: BufferIsReadyForWriting,
> > BufferIsReadyForExclusiveAccess or something else. The code will look like
> > below:
> >
> > if (BufferIsReadyForCleanup(bufid + 1))
> > break;
>
> Okay.
I've updated the patch accordingly.
> Also, the current test assumes that
> log_startup_progress_interval is effective during standby WAL replay.
> But as discussed in [1], that assumption may actually be wrong. It
> seems to work today, but that could just be accidental and might
> change in the future. So I think we should consider a testing approach
> that does not depend on that assumption.
>
> At the moment, the only alternative I can think of is to send SIGALRM
> periodically to the startup process directly from the test, though I'm
> not sure that's a good approach either...
I still haven't come up with a good way to test this without depending on
log_startup_progress_interval being active during standby WAL replay.
Even so, I think the fix is still worth committing because it addresses
a real issue. So how about committing 0001 patch first, even without
a regression test?
Regards,
--
Fujii Masao
| Attachment | Content-Type | Size |
|---|---|---|
| v5-0001-Fix-deadlock-detector-activation-in-a-recovery-co.patch | application/octet-stream | 8.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ayush Tiwari | 2026-06-18 02:53:51 | [PATCH] Fix two incorrect code comments |
| Previous Message | Tatsuo Ishii | 2026-06-18 02:44:40 | Re: IGNORE/RESPECT NULLS can be specified for (prokind == 'f'). |