Re: Implement waiting for wal lsn replay: reloaded

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2026-04-06 07:15:15
Message-ID: CABPTF7V-E_e3kQ2vtwUz6Jy7u-8_YeUT0SDoAbu7EKPgNp=ndA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 6, 2026 at 2:01 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> On Mon, Apr 6, 2026 at 11:26 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi Alexander,
> >
> > On Sun, Apr 5, 2026 at 8:31 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jan 29, 2026 at 9:47 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > > > On Tue, Jan 27, 2026 at 3:14 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > > > > Heikki spotted a misplaced wake-up call for replay waiters in
> > > > > PerformWalRecovery. He suggested that the WaitLSNWakeup needs to be
> > > > > invoked immediately after wal record is applied to avoid the potential
> > > > > missed wake-ups when recovery stops/pauses/promotes. It makes sense to
> > > > > me. Please check the attached patch to fix that.
> > > >
> > > > Pushed, thank you!
> > >
> > > I've assembled small patches, which I think worth pushing before v19 FF.
> > >
> > > 1. Avoid syscache lookup in WaitStmtResultDesc(). This is [1] patch,
> > > but I've removed redundant comment in ExecWaitStmt(), which explained
> > > the same as WaitStmtResultDesc() comment.
> > > 2. Use WAIT FOR LSN in wait_for_catchup(). I made the following
> > > changes: fallback to polling on not_in_recovery result instead of
> > > croaking, avoid separate pg_is_in_recovery() query, comment why we
> > > may face the recovery conflict and why it is safe to detect this by
> > > the error string.
> > > 3. A paragraph to the docs about possible recovery conflicts and their reason.
> > >
> > > I'm going to push this on Monday if no objections.
> > >
> > > Links.
> > > 1. https://www.postgresql.org/message-id/CABPTF7U%2BSUnJX_woQYGe%3D%3DR9Oz%2B-V6X0VO2stBLPGfJmH_LEhw%40mail.gmail.com
> > > 2. https://www.postgresql.org/message-id/CABPTF7X0n%3DR50z2fBpj3EbYYz04Ab0-DHJa%2BJfoAEny62QmUdg%40mail.gmail.com
> >
> > I'll review them shortly.
> >
>
> Here're some comments:
>
> 1) Patch 2 checks for not_in_recovery, but WAIT FOR ... NO_THROW
> returns “not in recovery”, which appears to prevent the
> promoted-standby fallback described in the patch from triggering.
> Instead, it seems to result in a hard failure.
>
> There are some behavior changes that I overlooked earlier:
>
> 2) Patch 2 changes the helper from "wait until timeout if there is no
> active replication connection" to "connect to the standby immediately
> and fail on ordinary connection failures". This appears to differ from
> the current contract and may affect cases where the standby is down,
> restarting, or not yet accepting SQL.
>
> "If there is no active replication connection from this peer, waits
> until poll_query_until timeout."
>
> 3) In patch 2, we returns success as soon as the standby locally
> reaches the target LSN, but the existing helper is explicitly defined
> in terms of pg_stat_replication ... state = 'streaming' on the
> upstream. So the new path can report success even when there is no
> active replication connection from that peer anymore.
>
> "The replication connection must be in a streaming state."
>
> I’m not sure whether these semantic changes are intended.
>

After some thoughts, these semantic changes appear to be improvements
over the artifacts of polling-based behavior. The pg_stat_replication
... state = 'streaming' is a precondition of polling works rather than
a post-condition check. The retry timeout also has nothing to do with
the edge cases and the state of the standby.

In practice, wait_for_catchup() is called when the standby is expected
to be up. If it's not, one of two things is true:

The standby is briefly restarting -> start() already waits for
readiness before returning, so this is a non-issue.
The standby is genuinely down -> waiting 180 seconds to time out
wastes CI time compared to failing fast.

I’ve addressed point 1 and updated the comments accordingly to reflect
the new behavior. Please check it.

--
Best,
Xuneng

Attachment Content-Type Size
v4-0001-Avoid-syscache-lookup-while-building-a-WAIT-FOR-t.patch application/x-patch 1.8 KB
v4-0003-Document-that-WAIT-FOR-may-be-interrupted-by-reco.patch application/x-patch 1.5 KB
v4-0002-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch application/x-patch 7.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Илья Чердаков 2026-04-06 07:55:25 Environment variable to disable diffs file output
Previous Message lakshmi 2026-04-06 06:54:46 Re: Pgbench: remove synchronous prepare