Quick Links

Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored

From:	Anthony Hsu <erwaman(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored
Date:	2025-06-19 15:58:16
Message-ID:	CALQc50hKhaoqZbk=CYRKx5gK9NykGDLUqb3HzMKF+Ezm1=Lw-A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Yes, another option might be to change the wakeup logic so that the startup
process is still woken up even if new processes have acquired the pin. The
main thing is the startup process should be woken up promptly so that it
can recheck if it can acquire the cleanup lock and if not,
send PROCSIG_RECOVERY_CONFLICT_BUFFERPIN again to cancel any new backends.

Before the standby limit time (default 30s) is reached,
ResolveRecoveryConflictWithBufferPin will enable both a standby limit
timeout and a deadlock timeout (default 1s). So if someone is holding a
conflicting buffer pin for a long time, due to the deadlock timeout, the
startup process will get woken up every 1s and recheck until we reach the
standby limit, at which point it'll send the
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN. But after reaching standby limit, the
next time the startup process does ResolveRecoveryConflictWithBufferPin, it
only sends PROCSIG_RECOVERY_CONFLICT_BUFFERPIN without enabling the
timeouts, which leads to the possibility of this race. So I thought a
simple solution to address this race would be to just always enable the
timeouts.

For context, the reason I noticed this issue and am interested in getting
it fixed is that I have seen cases where the startup process gets stalled
for minutes and sometimes hours due to conflicting buffer pins, leading to
high replay lag, even though the standby limit has not been changed (still
the default 30s). My expectation is that once the standby limit has
expired, conflicting backends should be promptly canceled.

On Thu, Jun 19, 2025 at 8:18 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> On Thu, Jun 19, 2025 at 5:25 PM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference: 18961
> > Logged by: Anthony Hsu
> > Email address: erwaman(at)gmail(dot)com
> > PostgreSQL version: 17.5
> > Operating system: Linux
> > Description:
> >
> > In the current ResolveRecoveryConflictWithBufferPin implementation in
> > standby.c, I think there's a race scenario where a backend holding a
> > conflicting buffer pin won't receive a
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN
> > message promptly:
> > 1. Assume max_standby_streaming_delay has expired when the startup
> process
> > enters ResolveRecoveryConflictWithBufferPin
> > 2. Assume backend 1 holds a conflicting buffer pin while backend 2 does
> not
> > 3. Since we are past the standby limit time, the startup process
> broadcasts
> > PROCSIG_RECOVERY_CONFLICT_BUFFERPIN here [1] without enabling any
> timeouts
> > 4. Then the startup process waits to be woken up via
> > ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN) here [2]
> > 5. Suppose backend 2 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN first,
> > sees it does not hold a conflicting buffer pin, and *then* proceeds to
> pin
> > the buffer
> > 6. Suppose then backend 1 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
> > processes interrupts, and cancels itself. During cleanup, in
> UnpinBuffer(),
> > it will see the pin count is still > 1 (startup process + backend 2 have
> it
> > pinned), so it will NOT wake up the startup process.
> > 7. The startup process will only get woken up once backend 2 unpins the
> > buffer and the pin count reaches 1 (or some other signal causes the
> startup
> > process's latch to be set). Only then will it try to acquire the cleanup
> > lock again and broadcast another PROCSIG_RECOVERY_CONFLICT_BUFFERPIN
> message
> > if it fails to acquire the cleanup lock.
> > The unexpected behavior is in step (7), where it could be arbitrarily
> long
> > until the startup process is woken up again. The expected behavior is
> that
> > since max_standby_streaming_delay has already expired, the startup
> process
> > should wake up quickly and broadcast another
> > PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message if there are still
> conflicting
> > backends.
> > I was able to reproduce this scenario with some custom code to control
> the
> > execution sequence.
> > One way to fix this scenario is to just remove the `if` block here [3]
> > entirely so that we always enable the STANDBY_TIMEOUT and
> > STANDBY_DEADLOCK_TIMEOUT timeouts.
> > [1]
> >
> https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L805
> > [2]
> >
> https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L842
> > [3]
> >
> https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L800-L806
>
> I agree this looks like a race condition, but I am not sure about the
> proposed solution. ResolveRecoveryConflictWithBufferPin() is called by
> LockBufferForCleanup() to wait for a process currently holding a
> buffer pin. However, new processes can still acquire the pin.. So I
> think the problem lies in the logic of the wakeup mechanism of the
> UnpinBuffer() no? The intended behavior is for the startup process to
> be woken regardless of Backend 2 subsequently acquiring the pin, but
> the current tracking mechanism is insufficient.
>
> --
> Regards,
> Dilip Kumar
> Google
>

In response to

Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored at 2025-06-19 15:17:47 from Dilip Kumar

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Dilip Kumar	2025-06-19 16:04:22	Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.
Previous Message	Junwang Zhao	2025-06-19 15:53:20	Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.