From: | Anthony Hsu <erwaman(at)gmail(dot)com> |
---|---|
To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored |
Date: | 2025-06-27 06:11:34 |
Message-ID: | CALQc50idegfrVB-wYe33XF0xPOZ3-BX9KaQQjS-5DxioHix4Yg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Jun 26, 2025 at 10:05 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Thu, Jun 19, 2025 at 9:28 PM Anthony Hsu <erwaman(at)gmail(dot)com> wrote:
> >
> > Yes, another option might be to change the wakeup logic so that the
> startup process is still woken up even if new processes have acquired the
> pin. The main thing is the startup process should be woken up promptly so
> that it can recheck if it can acquire the cleanup lock and if not, send
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN again to cancel any new backends.
> >
> > Before the standby limit time (default 30s) is reached,
> ResolveRecoveryConflictWithBufferPin will enable both a standby limit
> timeout and a deadlock timeout (default 1s). So if someone is holding a
> conflicting buffer pin for a long time, due to the deadlock timeout, the
> startup process will get woken up every 1s and recheck until we reach the
> standby limit, at which point it'll send the
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN. But after reaching standby limit, the
> next time the startup process does ResolveRecoveryConflictWithBufferPin, it
> only sends PROCSIG_RECOVERY_CONFLICT_BUFFERPIN without enabling the
> timeouts, which leads to the possibility of this race. So I thought a
> simple solution to address this race would be to just always enable the
> timeouts.
>
> Again looking at the code and I got confused with the code placement
> of ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN);, I mean if we are already
> behind then we broadcast 'PROCSIG_RECOVERY_CONFLICT_BUFFERPIN' and
> then wait for WAIT_EVENT_BUFFER_PIN signal, Otherwise, we will wait on
> timeouts, and once woken up by timeout we broadcast
> 'PROCSIG_RECOVERY_CONFLICT_BUFFERPIN' but now we don't bother to wait
> for WAIT_EVENT_BUFFER_PIN.
If we get woken by the standby_delay timeout, we broadcast
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and then return from this method
(ResolveRecoveryConflictWithBufferPin) back to LockBufferForCleanup, which
will then loop back to the beginning of the for loop [1]. It will then
check if it is the exclusive pinner again, and if not,
re-enter ResolveRecoveryConflictWithBufferPin, and then wait for
WAIT_EVENT_BUFFER_PIN.
I've attached a patch fixing the race condition by always enabling the
standby limit and deadlock timeouts.
> If I am not missing something this
> behavior seems inconsistent to me.
>
> --
> Regards,
> Dilip Kumar
> Google
>
Attachment | Content-Type | Size |
---|---|---|
0001-Always-enable-timeouts-in-ResolveRecoveryConflictWit.patch | application/octet-stream | 2.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2025-06-27 06:26:53 | Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored |
Previous Message | Richard Guo | 2025-06-27 06:08:03 | Re: BUG #18953: Planner fails to build plan for complex query with LATERAL references |