Re: Implement waiting for wal lsn replay: reloaded

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2026-04-28 21:01:32
Message-ID: CAPpHfdufWG032J=fyv1eWoveeyPwqJ57PGU2edA5OsOmexGDTw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 21, 2026 at 7:03 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> On Tue, Apr 21, 2026 at 2:46 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> > The updated patchset is attached. It includes improved coverage as
> > suggested by Andres upthread. And documentation that WAIT FOR LSN is
> > timeline-blind (per off-list discussion with Xuneng).
>
> I revised the test patch 6 to make the new cases check the intended
> WAIT FOR behavior more directly, and to avoid cases where the test
> could pass for the wrong reason.
>
> The fresh walreceiver restart test now distinguishes what we can
> observe from what is only covered indirectly.
> 'pg_last_wal_receive_lsn()' reports 'flushedUpto', not 'writtenUpto',
> so the test now describes that state accurately and covers
> 'writtenUpto' through the 'standby_write' result. This seems
> appropriate to me since the two positions are seeded in the places and
> conditions. Test for flush lsn should also help verify write lsn.
>
> The fencepost tests were split by the actual frontier being tested.
> 'standby_replay' uses 'pg_last_wal_replay_lsn()', while
> 'standby_flush' uses 'pg_last_wal_receive_lsn()'. This avoids treating
> a replay-derived LSN as if it were also the exact write/flush
> boundary. I left 'standby_write' out of the exact fencepost helper
> because its frontier is not SQL-visible once walreceiver is stopped.
> The async wakeup case now starts the waiter while replay is still
> paused, so it must actually sleep before replay and walreceiver are
> allowed to advance.
>
> The cascading timeline-switch test now checks the 'WAIT FOR ...
> NO_THROW' status from background psql stdout. The previous log-marker
> pattern could pass after unexpected returned status, includingn
> 'timeout', because the following statement would still run. The
> 'received_tli > 1' check remains, but only as confirmation that the
> downstream followed the new timeline; the 'success' status proves the
> wait completed as intended.
>
> Please check it.

LGTM, I've added some comments for new functions in 0006. I propose
to push this patchset. Probably something is still missing and we
will have to go back to this. But it seems to make a lot of aspects
much better.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v7-0002-Fix-memory-ordering-in-WAIT-FOR-LSN-wakeup-mechan.patch application/octet-stream 4.3 KB
v7-0005-Wake-standby_write-standby_flush-waiters-from-the.patch application/octet-stream 5.9 KB
v7-0004-Use-replay-position-as-floor-for-WAIT-FOR-LSN-sta.patch application/octet-stream 8.7 KB
v7-0003-Remove-redundant-WAIT-FOR-LSN-caller-side-pre-che.patch application/octet-stream 5.2 KB
v7-0001-Use-barrier-semantics-when-reading-writing-writte.patch application/octet-stream 3.1 KB
v7-0006-Improve-WAIT-FOR-LSN-test-coverage.patch application/octet-stream 14.0 KB
v7-0007-Document-that-WAIT-FOR-LSN-is-timeline-blind.patch application/octet-stream 1.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2026-04-28 21:15:58 Fix race condition in XLogLogicalInfo and ProcSignal initialization.
Previous Message Nathan Bossart 2026-04-28 20:48:13 Re: [BUG?] macOS (Intel) build warnings: "ranlib: file … has no symbols" for aarch64 objects