Quick Links

Re: Implement waiting for wal lsn replay: reloaded

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject:	Re: Implement waiting for wal lsn replay: reloaded
Date:	2025-12-25 11:13:36
Message-ID:	CAPpHfdsBPZem8LnejpAxXbkxrCW86_jWM7dJFm+Miug2HgJJ0Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, Xuneng!

On Mon, Dec 22, 2025 at 9:57 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi Alexander,
> >
> > Thanks for your feedback!
> >
> > > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> > > mode parameter. Should we allow this?
> >
> > I think this constraint could be relaxed if needed. I was previously
> > unsure about the use cases.
>
> Flush mode on the primary seems useful when synchronous_commit is set
> to off [1]. In that mode, a transaction in primary may return success
> before its WAL is durably flushed to disk, trading durability for
> lower latency. A “wait for primary flush” operation provides an
> explicit durability barrier for cases where applications or tools
> occasionally need stronger guarantees.
>
> [1] https://postgresqlco.nf/doc/en/param/synchronous_commit/
>
> > > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> > > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH? In
> > > principle, we could encode both as just 'flush' mode, and detect which
> > > WaitLSNType to pick by checking if recovery is in progress. However,
> > > how should we then react to unreached flush location after standby
> > > promotion (technically it could be still reached but on the different
> > > timeline)?
> > >
> >
> > Technically, we can use 'flush' mode to specify WAIT FOR behavior in
> > both primary and replica. Currently, wait for commands error out if
> > promotion occurs since: either the requested LSN type does not exist
> > on the primary, or we do not yet have the infrastructure to support
> > continuing the wait. If we allow waiting for flush on the primary as a
> > user-visible command and the wake-up calls for flush in primary are
> > introduced, the question becomes whether we should still abort the
> > wait on promotion, or continue waiting—as you noted—given that the
> > target LSN might still be reached, albeit on a different timeline. The
> > question behind this might be: do users care and should be aware of
> > the state change of the server while waiting? If they do, then we
> > better stop the waiting and report the error. In this case, I am
> > inclined to to break the unified flush mode to something like
> > primary_flush/standby_flush mode and
> > WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
> >
>
> After further consideration, it also seems reasonable to use a single,
> unified flush mode that works on both primary and standby servers,
> provided its semantics are clearly documented to avoid the potential
> confusion on failure. I don’t have a strong preference between these
> two and would be interested in your thoughts.
>
> If a standby is promoted while a session is waiting, the command
> better abort and return an error (or report “not in recovery” when
> using NO_THROW). At that point, the meaning of the LSN being waited
> for may have changed due to the timeline switch and the transition
> from standby to primary. An LSN such as 0/5000000 on TLI 2 can
> represent entirely different WAL content from 0/5000000 on TLI 1.
> Allowing the wait to silently continue across promotion risks giving
> users a false sense of safety—for example, interpreting “wait
> completed” as “the original data is now durable,” which would no
> longer be true.

Agree, but there is still risk that promotion happens after user send
the query but before we started to wait. In this case we will still
silently start to wait on primary, while user probably meant to wait
on replica. Probably it would be safer to have separate user-visible
modes for waiting on primary and on replica?

------
Regards,
Alexander Korotkov
Supabase

In response to

Re: Implement waiting for wal lsn replay: reloaded at 2025-12-22 07:56:59 from Xuneng Zhou

Responses

Re: Implement waiting for wal lsn replay: reloaded at 2025-12-25 12:52:34 from Xuneng Zhou

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	cca5507	2025-12-25 11:13:38	Fix incorrect assertion in heapgettup_pagemode()
Previous Message	Tender Wang	2025-12-25 11:00:02	Re: Extended Statistics set/restore/clear functions.