Re: Implement waiting for wal lsn replay: reloaded

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2025-08-07 15:00:50
Message-ID: CABPTF7U7_iYEXn3xeV29-xwFWqTc+2N4=CdL7qEzPPg=hv8eqA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for working on this.

I’ve just come across this thread and haven’t had a chance to dig into
the patch yet, but I’m keen to review it soon. In the meantime, I have
a quick question: is WAIT FOR REPLY intended mainly for user-defined
functions, or can internal code invoke it as well?

During a recent performance run [1] I noticed heavy polling in
read_local_xlog_page_guts(). Heikki’s comment from a few months ago
also hints that we could replace this check–sleep–repeat loop with the
condition-variable (CV) infrastructure used by walsender:

/*
* Loop waiting for xlog to be available if necessary
*
* TODO: The walsender has its own version of this function, which uses a
* condition variable to wake up whenever WAL is flushed. We could use the
* same infrastructure here, instead of the check/sleep/repeat style of
* loop.
*/

Because read_local_xlog_page_guts() waits for a specific flush or
replay LSN, polling becomes inefficient when the wait is long. I built
a POC patch that swaps polling for CVs, but a single global CV (or
even separate “flush” and “replay” CVs) isn’t ideal:

The wake-up routines don’t know which LSN each waiter cares about, so
they’d have to broadcast on every flush/replay. Caching the minimum
outstanding LSN could reduce spuriously awakened waiters, yet wouldn’t
eliminate them—multiple backends might wait for different LSNs
simultaneously. A more precise solution would require a request queue
that maps waiters to target LSNs and issues targeted wake-ups, adding
complexity.

Walsender accepts the potential broadcast overhead by using two cvs
for different waiters, so it might be acceptable for
read_local_xlog_page_guts() as well. However, if WAIT FOR REPLY
becomes available to backend code, we might leverage it to eliminate
the polling for waiting replay in read_local_xlog_page_guts() without
introducing a bespoke dispatcher. I’d appreciate any thoughts on
whether that use case is in scope.

Best,
Xuneng

[1] https://www.postgresql.org/message-id/CABPTF7VuFYm9TtA9vY8ZtS77qsT+yL_HtSDxUFnW3XsdB5b9ew@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2025-08-07 15:17:17 Re: VM corruption on standby
Previous Message Tom Lane 2025-08-07 14:58:56 Re: headerscheck warnings with late-model gcc