| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Deadlock detector fails to activate on a hot standby replica |
| Date: | 2026-06-05 11:58:11 |
| Message-ID: | CAHGQGwGNfwDH4hF_LnXzjnUd3d8+PF=f3dRbNn9XkL1hFbvRRQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, May 27, 2026 at 12:01 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> + if (got_standby_delay_timeout)
> + SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN);
> + else if (got_standby_deadlock_timeout)
> + {
>
> Shouldn't we break out of the loop when either got_standby_delay_timeout or
> got_standby_deadlock_timeout becomes true? Otherwise, the loop continues with
> those flags still set, which could cause SendRecoveryConflictWithBufferPin() to
> be called unnecessarily in the subsequent cycles.
>
>
> + if (BufferGetRefCount(buffer) <= 1)
>
> Should this be "BufferGetRefCount(buffer) == 1" instead? I don't think
> BufferGetRefCount(buffer) should ever return 0 here. If that's correct,
> would it make sense to explicitly detect that case, for example:
>
> -----------------
> uint32 refcount = BufferGetRefCount(buffer);
>
> Assert(refcount > 0);
>
> if (refcount == 0)
> elog(ERROR, "buffer refcount dropped to zero while waiting for
> cleanup lock");
>
> if (refcount == 1)
> break;
> -----------------
I've updated the patch based on these comments.
Attached is the latest version.
I removed the TAP test from this patch for now. I'll consider adding
a test for this separately later.
BTW, I'm just wondering whether ResolveRecoveryConflictWithLock()
might have the same issue. I need to investigate that further.
Regards,
--
Fujii Masao
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Fix-deadlock-detector-activation-in-a-recovery-co.patch | application/octet-stream | 6.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Etsuro Fujita | 2026-06-05 11:59:17 | Re: [(known) BUG] DELETE/UPDATE more than one row in partitioned foreign table |
| Previous Message | Ashutosh Bapat | 2026-06-05 11:52:35 | Re: Use correct type for catalog_xmin |