Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored

From: Anthony Hsu <erwaman(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored
Date: 2025-06-27 06:11:34
Message-ID: CALQc50idegfrVB-wYe33XF0xPOZ3-BX9KaQQjS-5DxioHix4Yg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jun 26, 2025 at 10:05 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> On Thu, Jun 19, 2025 at 9:28 PM Anthony Hsu <erwaman(at)gmail(dot)com> wrote:
> >
> > Yes, another option might be to change the wakeup logic so that the
> startup process is still woken up even if new processes have acquired the
> pin. The main thing is the startup process should be woken up promptly so
> that it can recheck if it can acquire the cleanup lock and if not, send
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN again to cancel any new backends.
> >
> > Before the standby limit time (default 30s) is reached,
> ResolveRecoveryConflictWithBufferPin will enable both a standby limit
> timeout and a deadlock timeout (default 1s). So if someone is holding a
> conflicting buffer pin for a long time, due to the deadlock timeout, the
> startup process will get woken up every 1s and recheck until we reach the
> standby limit, at which point it'll send the
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN. But after reaching standby limit, the
> next time the startup process does ResolveRecoveryConflictWithBufferPin, it
> only sends PROCSIG_RECOVERY_CONFLICT_BUFFERPIN without enabling the
> timeouts, which leads to the possibility of this race. So I thought a
> simple solution to address this race would be to just always enable the
> timeouts.
>
> Again looking at the code and I got confused with the code placement
> of ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN);, I mean if we are already
> behind then we broadcast 'PROCSIG_RECOVERY_CONFLICT_BUFFERPIN' and
> then wait for WAIT_EVENT_BUFFER_PIN signal, Otherwise, we will wait on
> timeouts, and once woken up by timeout we broadcast
> 'PROCSIG_RECOVERY_CONFLICT_BUFFERPIN' but now we don't bother to wait
> for WAIT_EVENT_BUFFER_PIN.

If we get woken by the standby_delay timeout, we broadcast
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and then return from this method
(ResolveRecoveryConflictWithBufferPin) back to LockBufferForCleanup, which
will then loop back to the beginning of the for loop [1]. It will then
check if it is the exclusive pinner again, and if not,
re-enter ResolveRecoveryConflictWithBufferPin, and then wait for
WAIT_EVENT_BUFFER_PIN.

I've attached a patch fixing the race condition by always enabling the
standby limit and deadlock timeouts.

[1]
https://github.com/postgres/postgres/blob/94e2e150ec72a3b37e3847be99c4aca3320c38f9/src/backend/storage/buffer/bufmgr.c#L5710

> If I am not missing something this
> behavior seems inconsistent to me.
>
> --
> Regards,
> Dilip Kumar
> Google
>

Attachment Content-Type Size
0001-Always-enable-timeouts-in-ResolveRecoveryConflictWit.patch application/octet-stream 2.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dilip Kumar 2025-06-27 06:26:53 Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored
Previous Message Richard Guo 2025-06-27 06:08:03 Re: BUG #18953: Planner fails to build plan for complex query with LATERAL references