Re: Implement waiting for wal lsn replay: reloaded

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2025-12-22 07:56:59
Message-ID: CABPTF7WJf5BnG1yCVk032+QiGuCmrhgnfNFO2HTgcuWAeRD9+A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi Alexander,
>
> Thanks for your feedback!
>
> > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> > mode parameter. Should we allow this?
>
> I think this constraint could be relaxed if needed. I was previously
> unsure about the use cases.

Flush mode on the primary seems useful when synchronous_commit is set
to off [1]. In that mode, a transaction in primary may return success
before its WAL is durably flushed to disk, trading durability for
lower latency. A “wait for primary flush” operation provides an
explicit durability barrier for cases where applications or tools
occasionally need stronger guarantees.

[1] https://postgresqlco.nf/doc/en/param/synchronous_commit/

> > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH? In
> > principle, we could encode both as just 'flush' mode, and detect which
> > WaitLSNType to pick by checking if recovery is in progress. However,
> > how should we then react to unreached flush location after standby
> > promotion (technically it could be still reached but on the different
> > timeline)?
> >
>
> Technically, we can use 'flush' mode to specify WAIT FOR behavior in
> both primary and replica. Currently, wait for commands error out if
> promotion occurs since: either the requested LSN type does not exist
> on the primary, or we do not yet have the infrastructure to support
> continuing the wait. If we allow waiting for flush on the primary as a
> user-visible command and the wake-up calls for flush in primary are
> introduced, the question becomes whether we should still abort the
> wait on promotion, or continue waiting—as you noted—given that the
> target LSN might still be reached, albeit on a different timeline. The
> question behind this might be: do users care and should be aware of
> the state change of the server while waiting? If they do, then we
> better stop the waiting and report the error. In this case, I am
> inclined to to break the unified flush mode to something like
> primary_flush/standby_flush mode and
> WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
>

After further consideration, it also seems reasonable to use a single,
unified flush mode that works on both primary and standby servers,
provided its semantics are clearly documented to avoid the potential
confusion on failure. I don’t have a strong preference between these
two and would be interested in your thoughts.

If a standby is promoted while a session is waiting, the command
better abort and return an error (or report “not in recovery” when
using NO_THROW). At that point, the meaning of the LSN being waited
for may have changed due to the timeline switch and the transition
from standby to primary. An LSN such as 0/5000000 on TLI 2 can
represent entirely different WAL content from 0/5000000 on TLI 1.
Allowing the wait to silently continue across promotion risks giving
users a false sense of safety—for example, interpreting “wait
completed” as “the original data is now durable,” which would no
longer be true.

--
Best,
Xuneng

Attachment Content-Type Size
synchronous_commit.png image/png 256.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message preTham 2025-12-22 08:22:29 Re: [PATCH] Why is_admin_of_role() uses ROLERECURSE_MEMBERS rather than ROLERECURSE_PRIVS?
Previous Message Chao Li 2025-12-22 07:19:39 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)