From: | Anthony Hsu <erwaman(at)gmail(dot)com> |
---|---|
To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored |
Date: | 2025-06-28 23:37:13 |
Message-ID: | CALQc50i+3XaW18DzZ9y0mOXu0UtOpias5nev3GNY48J=9rX6AA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Made a few changes to my patch and attached a new version. Changes include:
* avoid sending PROCSIG_RECOVERY_CONFLICT_BUFFERPIN in a tight loop after
standby limit is reached by adding a small sleep after each send, starting
at 1ms and doubling each time up to 1s (similar to what's done
in WaitExceedsMaxStandbyDelay here [1]). Sleep time is reset in
LockBufferForCleanup once cleanup lock is successfully acquired
* don't set deadlock timeout once standby limit has passed since after
that, the standby timeout will fire immediately, so no need to set deadlock
timeout as well
Let me know if you have any thoughts or comments. I am thinking of sending
it to pgsql-hackers soon.
On Thu, Jun 26, 2025 at 11:27 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Fri, Jun 27, 2025 at 11:41 AM Anthony Hsu <erwaman(at)gmail(dot)com> wrote:
> >
> >
> >
> > On Thu, Jun 26, 2025 at 10:05 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> wrote:
> >>
> >> On Thu, Jun 19, 2025 at 9:28 PM Anthony Hsu <erwaman(at)gmail(dot)com> wrote:
> >> >
> >> > Yes, another option might be to change the wakeup logic so that the
> startup process is still woken up even if new processes have acquired the
> pin. The main thing is the startup process should be woken up promptly so
> that it can recheck if it can acquire the cleanup lock and if not, send
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN again to cancel any new backends.
> >> >
> >> > Before the standby limit time (default 30s) is reached,
> ResolveRecoveryConflictWithBufferPin will enable both a standby limit
> timeout and a deadlock timeout (default 1s). So if someone is holding a
> conflicting buffer pin for a long time, due to the deadlock timeout, the
> startup process will get woken up every 1s and recheck until we reach the
> standby limit, at which point it'll send the
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN. But after reaching standby limit, the
> next time the startup process does ResolveRecoveryConflictWithBufferPin, it
> only sends PROCSIG_RECOVERY_CONFLICT_BUFFERPIN without enabling the
> timeouts, which leads to the possibility of this race. So I thought a
> simple solution to address this race would be to just always enable the
> timeouts.
> >>
> >> Again looking at the code and I got confused with the code placement
> >> of ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN);, I mean if we are already
> >> behind then we broadcast 'PROCSIG_RECOVERY_CONFLICT_BUFFERPIN' and
> >> then wait for WAIT_EVENT_BUFFER_PIN signal, Otherwise, we will wait on
> >> timeouts, and once woken up by timeout we broadcast
> >> 'PROCSIG_RECOVERY_CONFLICT_BUFFERPIN' but now we don't bother to wait
> >> for WAIT_EVENT_BUFFER_PIN.
> >
> >
> > If we get woken by the standby_delay timeout, we broadcast
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and then return from this method
> (ResolveRecoveryConflictWithBufferPin) back to LockBufferForCleanup, which
> will then loop back to the beginning of the for loop [1]. It will then
> check if it is the exclusive pinner again, and if not, re-enter
> ResolveRecoveryConflictWithBufferPin, and then wait for
> WAIT_EVENT_BUFFER_PIN.
> >
> Oh yeah, that's correct.
>
> --
> Regards,
> Dilip Kumar
> Google
>
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Always-enable-standby-timeout-in-ResolveRecoveryC.patch | application/octet-stream | 5.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Braulio Fdo Gonzalez | 2025-06-29 03:29:44 | Re: Logical replication 'ERROR: invalid memory alloc request size 1831213792' after upgrading to 15.13 |
Previous Message | David G. Johnston | 2025-06-28 21:12:02 | Re: BUG #18971: Server passes an invalid (indirect) path in PGDATA to the external program |