| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
| Subject: | Re: Implement waiting for wal lsn replay: reloaded |
| Date: | 2026-04-06 06:01:11 |
| Message-ID: | CABPTF7Um7oRKBYmek_3gfbaMg5DLzHZmAX-GEacHASfD34xo5g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Apr 6, 2026 at 11:26 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi Alexander,
>
> On Sun, Apr 5, 2026 at 8:31 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >
> > On Thu, Jan 29, 2026 at 9:47 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > > On Tue, Jan 27, 2026 at 3:14 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > > > Heikki spotted a misplaced wake-up call for replay waiters in
> > > > PerformWalRecovery. He suggested that the WaitLSNWakeup needs to be
> > > > invoked immediately after wal record is applied to avoid the potential
> > > > missed wake-ups when recovery stops/pauses/promotes. It makes sense to
> > > > me. Please check the attached patch to fix that.
> > >
> > > Pushed, thank you!
> >
> > I've assembled small patches, which I think worth pushing before v19 FF.
> >
> > 1. Avoid syscache lookup in WaitStmtResultDesc(). This is [1] patch,
> > but I've removed redundant comment in ExecWaitStmt(), which explained
> > the same as WaitStmtResultDesc() comment.
> > 2. Use WAIT FOR LSN in wait_for_catchup(). I made the following
> > changes: fallback to polling on not_in_recovery result instead of
> > croaking, avoid separate pg_is_in_recovery() query, comment why we
> > may face the recovery conflict and why it is safe to detect this by
> > the error string.
> > 3. A paragraph to the docs about possible recovery conflicts and their reason.
> >
> > I'm going to push this on Monday if no objections.
> >
> > Links.
> > 1. https://www.postgresql.org/message-id/CABPTF7U%2BSUnJX_woQYGe%3D%3DR9Oz%2B-V6X0VO2stBLPGfJmH_LEhw%40mail.gmail.com
> > 2. https://www.postgresql.org/message-id/CABPTF7X0n%3DR50z2fBpj3EbYYz04Ab0-DHJa%2BJfoAEny62QmUdg%40mail.gmail.com
>
> I'll review them shortly.
>
Here're some comments:
1) Patch 2 checks for not_in_recovery, but WAIT FOR ... NO_THROW
returns “not in recovery”, which appears to prevent the
promoted-standby fallback described in the patch from triggering.
Instead, it seems to result in a hard failure.
There are some behavior changes that I overlooked earlier:
2) Patch 2 changes the helper from "wait until timeout if there is no
active replication connection" to "connect to the standby immediately
and fail on ordinary connection failures". This appears to differ from
the current contract and may affect cases where the standby is down,
restarting, or not yet accepting SQL.
"If there is no active replication connection from this peer, waits
until poll_query_until timeout."
3) In patch 2, we returns success as soon as the standby locally
reaches the target LSN, but the existing helper is explicitly defined
in terms of pg_stat_replication ... state = 'streaming' on the
upstream. So the new path can report success even when there is no
active replication connection from that peer anymore.
"The replication connection must be in a streaming state."
I’m not sure whether these semantic changes are intended.
--
Best,
Xuneng
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Xuneng Zhou | 2026-04-06 06:04:14 | Re: tablecmds: fix bug where index rebuild loses replica identity on partitions |
| Previous Message | Peter Geoghegan | 2026-04-06 05:47:43 | Re: index prefetching |