From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | erwaman(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored |
Date: | 2025-06-19 15:17:47 |
Message-ID: | CAFiTN-sOv02tM0MKVpBp=35q3eb+Vp2fitKBPBt7=bDwYbWTCg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Jun 19, 2025 at 5:25 PM PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 18961
> Logged by: Anthony Hsu
> Email address: erwaman(at)gmail(dot)com
> PostgreSQL version: 17.5
> Operating system: Linux
> Description:
>
> In the current ResolveRecoveryConflictWithBufferPin implementation in
> standby.c, I think there's a race scenario where a backend holding a
> conflicting buffer pin won't receive a PROCSIG_RECOVERY_CONFLICT_BUFFERPIN
> message promptly:
> 1. Assume max_standby_streaming_delay has expired when the startup process
> enters ResolveRecoveryConflictWithBufferPin
> 2. Assume backend 1 holds a conflicting buffer pin while backend 2 does not
> 3. Since we are past the standby limit time, the startup process broadcasts
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN here [1] without enabling any timeouts
> 4. Then the startup process waits to be woken up via
> ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN) here [2]
> 5. Suppose backend 2 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN first,
> sees it does not hold a conflicting buffer pin, and *then* proceeds to pin
> the buffer
> 6. Suppose then backend 1 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
> processes interrupts, and cancels itself. During cleanup, in UnpinBuffer(),
> it will see the pin count is still > 1 (startup process + backend 2 have it
> pinned), so it will NOT wake up the startup process.
> 7. The startup process will only get woken up once backend 2 unpins the
> buffer and the pin count reaches 1 (or some other signal causes the startup
> process's latch to be set). Only then will it try to acquire the cleanup
> lock again and broadcast another PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message
> if it fails to acquire the cleanup lock.
> The unexpected behavior is in step (7), where it could be arbitrarily long
> until the startup process is woken up again. The expected behavior is that
> since max_standby_streaming_delay has already expired, the startup process
> should wake up quickly and broadcast another
> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message if there are still conflicting
> backends.
> I was able to reproduce this scenario with some custom code to control the
> execution sequence.
> One way to fix this scenario is to just remove the `if` block here [3]
> entirely so that we always enable the STANDBY_TIMEOUT and
> STANDBY_DEADLOCK_TIMEOUT timeouts.
> [1]
> https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L805
> [2]
> https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L842
> [3]
> https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L800-L806
I agree this looks like a race condition, but I am not sure about the
proposed solution. ResolveRecoveryConflictWithBufferPin() is called by
LockBufferForCleanup() to wait for a process currently holding a
buffer pin. However, new processes can still acquire the pin.. So I
think the problem lies in the logic of the wakeup mechanism of the
UnpinBuffer() no? The intended behavior is for the startup process to
be woken regardless of Backend 2 subsequently acquiring the pin, but
the current tracking mechanism is insufficient.
--
Regards,
Dilip Kumar
Google
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2025-06-19 15:37:57 | Re: [EXTERNAL] Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table. |
Previous Message | Chrzan, Maximilian | 2025-06-19 14:08:34 | AW: [EXTERNAL] Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table. |