Quick Links

Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting

From:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject:	Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting
Date:	2025-08-27 15:31:03
Message-ID:	CABPTF7Vr99gZ5GM_ZYbYnd9MMnoVW3pukBEviVoHKRvJW-dE3g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Attached the wrong patch
v1-0001-Improve-read_local_xlog_page_guts-by-replacing-po.patch. The
correct one is attached again.

On Wed, Aug 27, 2025 at 11:23 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi hackers,
>
> During a performance run [1], I observed heavy polling in
> read_local_xlog_page_guts(). Heikki’s comment from a few months ago
> suggests replacing the current check–sleep–repeat loop with the
> condition-variable (CV) infrastructure used by the walsender:
>
> 1) Problem and Background
> /*
> * Loop waiting for xlog to be available if necessary
> *
> * TODO: The walsender has its own version of this function, which uses a
> * condition variable to wake up whenever WAL is flushed. We could use the
> * same infrastructure here, instead of the check/sleep/repeat style of
> * loop.
> */
>
> Because read_local_xlog_page_guts() waits for a specific flush or
> replay LSN, polling becomes inefficient when waits are long. I built a
> POC patch that swaps polling for CVs, but a single global CV (or even
> separate “flush” and “replay” CVs) isn’t ideal:
> • The wake-up routines don’t know which LSN each waiter cares about,
> so they would need to broadcast on every flush/replay.
>
> • Caching the minimum outstanding target LSN could reduce spurious
> wake-ups but won’t eliminate them when multiple backends wait for
> different LSNs simultaneously.
>
> • The walsender accepts some broadcast overhead via two CVs for
> different waiters. A more precise approach would require a request
> queue that maps waiters to target LSNs and issues targeted
> wake-ups—adding complexity.
>
> 2) Proposal
> I came across the thread “Implement waiting for WAL LSN replay:
> reloaded” [2] by Alexander. The “Implement WAIT FOR” patch in that
> thread provides a well-established infrastructure for waiting on WAL
> replay in backends. With modest adjustments, it could be generalized.
>
> Main changes in patch v1 Improve read_local_xlog_page_guts by replacing polling
> with latch-based waiting:
> • Introduce WaitForLSNFlush, analogous to WaitForLSNReplay from the
> “WAIT FOR” work.
>
> • Replace the busy-wait in read_local_xlog_page_guts() with
> WaitForLSNFlush and WaitForLSNReplay.
>
> • Add wake-up calls in XLogFlush and XLogBackgroundFlush.
>
> Edge Case: Timeline Switch During Wait
> /*
> * Check which timeline to get the record from.
> *
> * We have to do it each time through the loop because if we're in
> * recovery as a cascading standby, the current timeline might've
> * become historical. We can't rely on RecoveryInProgress() because in
> * a standby configuration like
> *
> * A => B => C
> *
> * if we're a logical decoding session on C, and B gets promoted, our
> * timeline will change while we remain in recovery.
> *
> * We can't just keep reading from the old timeline as the last WAL
> * archive in the timeline will get renamed to .partial by
> * StartupXLOG().
>
> read_local_xlog_page_guts() re-evaluates the active timeline on each
> loop iteration because, on a cascading standby, the current timeline
> can become historical. Once that happens, there’s no need to keep
> waiting for that timeline. A timeline switch could therefore render an
> in-progress wait unnecessary.
>
> One option is to add a wake-up at the point where the timeline switch
> occurs, so waiting processes exit promptly. The current approach
> chooses not to do this, given that most waits are short and timeline
> changes in cascading standby are rare. Supporting timeline-switch
> wake-ups would also require additional handling in both
> WaitForLSNFlush and WaitForLSNReplay, increasing complexity.
>
> Comments and suggestions are welcome.
>
> [1] https://www.postgresql.org/message-id/CABPTF7VuFYm9TtA9vY8ZtS77qsT+yL_HtSDxUFnW3XsdB5b9ew@mail.gmail.com
> [2] https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO%2BBBjcirozJ6nYbOW8Q%40mail.gmail.com
>
> Best,
> Xuneng

Attachment	Content-Type	Size
v2-0001-Improve-read_local_xlog_page_guts-by-replacing-po.patch	application/octet-stream	7.8 KB

In response to

Improve read_local_xlog_page_guts by replacing polling with latch-based waiting at 2025-08-27 15:23:34 from Xuneng Zhou

Responses

Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting at 2025-08-28 08:22:55 from Xuneng Zhou

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2025-08-27 15:42:27	Re: Identifying function-lookup failures due to argument name mismatches
Previous Message	Xuneng Zhou	2025-08-27 15:23:34	Improve read_local_xlog_page_guts by replacing polling with latch-based waiting