Set 1s WaitLatch timeout if standby limit has expired in ResolveRecoveryConflictWithBufferPin

From: Anthony Hsu <erwaman(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Set 1s WaitLatch timeout if standby limit has expired in ResolveRecoveryConflictWithBufferPin
Date: 2025-07-06 18:55:15
Message-ID: CALQc50gi-Kw9m1r6hytf12473-fCECy=q9JtKS4ANeJFEyCBTw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I think there is a race scenario where a backend holding a conflicting
buffer pin isn't promptly canceled even when the standby limit has expired:

1. suppose there is a buffer pin conflict and standby limit has already
expired
2. startup process enters ResolveRecoveryConflictWithBufferPin and
broadcasts PROCSIG_RECOVERY_CONFLICT_BUFFERPIN here [A] but does not set
any timeouts
3. startup process waits to be signaled by UnpinBuffer() here [B]
4. some non-conflicting backend receives the buffer pin signal sent in (2),
checks and sees it is not blocking recovery, and *then* acquires a
conflicting buffer pin
5. then the original conflicting backend receives the buffer pin signal
sent in (2) and cancels itself, calling UnpinBuffer(). But the pin count
will still be > 1 (due to (4) + the pin startup holds), so startup process
will not be woken up

In this scenario, the startup process might not be woken up for an
arbitrarily long length of time. And the new conflicting backend (step (4)
above) won't get sent another PROCSIG_RECOVERY_CONFLICT_BUFFERPIN signal
telling it to cancel itself.

To handle this scenario, I think we should set a timeout when doing
WaitLatch if standby limit has already expired. This allows the startup
process to wake up in a reasonable time to recheck and send
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN again to any new conflicting backends.
I have attached a small patch with this proposed fix.

Thanks,
Anthony

[A]
https://github.com/postgres/postgres/blob/21c9756db6458f859e6579a6754c78154321cb39/src/backend/storage/ipc/standby.c#L806
[B]
https://github.com/postgres/postgres/blob/21c9756db6458f859e6579a6754c78154321cb39/src/backend/storage/ipc/standby.c#L843

Attachment Content-Type Size
v1-0001-Set-1s-WaitLatch-timeout-if-standby-limit-has-exp.patch application/octet-stream 3.9 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Arseniy Mukhin 2025-07-06 18:59:39 Re: amcheck support for BRIN indexes
Previous Message Tom Lane 2025-07-06 18:26:49 Re: A recent message added to pg_upgade