Re: Deadlock detector fails to activate on a hot standby replica

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Deadlock detector fails to activate on a hot standby replica
Date: 2026-06-05 11:58:11
Message-ID: CAHGQGwGNfwDH4hF_LnXzjnUd3d8+PF=f3dRbNn9XkL1hFbvRRQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 27, 2026 at 12:01 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> + if (got_standby_delay_timeout)
> + SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN);
> + else if (got_standby_deadlock_timeout)
> + {
>
> Shouldn't we break out of the loop when either got_standby_delay_timeout or
> got_standby_deadlock_timeout becomes true? Otherwise, the loop continues with
> those flags still set, which could cause SendRecoveryConflictWithBufferPin() to
> be called unnecessarily in the subsequent cycles.
>
>
> + if (BufferGetRefCount(buffer) <= 1)
>
> Should this be "BufferGetRefCount(buffer) == 1" instead? I don't think
> BufferGetRefCount(buffer) should ever return 0 here. If that's correct,
> would it make sense to explicitly detect that case, for example:
>
> -----------------
> uint32 refcount = BufferGetRefCount(buffer);
>
> Assert(refcount > 0);
>
> if (refcount == 0)
> elog(ERROR, "buffer refcount dropped to zero while waiting for
> cleanup lock");
>
> if (refcount == 1)
> break;
> -----------------

I've updated the patch based on these comments.
Attached is the latest version.

I removed the TAP test from this patch for now. I'll consider adding
a test for this separately later.

BTW, I'm just wondering whether ResolveRecoveryConflictWithLock()
might have the same issue. I need to investigate that further.

Regards,

--
Fujii Masao

Attachment Content-Type Size
v3-0001-Fix-deadlock-detector-activation-in-a-recovery-co.patch application/octet-stream 6.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2026-06-05 11:59:17 Re: [(known) BUG] DELETE/UPDATE more than one row in partitioned foreign table
Previous Message Ashutosh Bapat 2026-06-05 11:52:35 Re: Use correct type for catalog_xmin