Re: Implement waiting for wal lsn replay: reloaded

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2026-04-21 04:03:30
Message-ID: CABPTF7WJ35p7uidJJZs7fzxBtbVL_0xSFUdZ2Fe8pXh00e=Mxw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 21, 2026 at 2:46 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:

> The updated patchset is attached. It includes improved coverage as
> suggested by Andres upthread. And documentation that WAIT FOR LSN is
> timeline-blind (per off-list discussion with Xuneng).

I revised the test patch 6 to make the new cases check the intended
WAIT FOR behavior more directly, and to avoid cases where the test
could pass for the wrong reason.

The fresh walreceiver restart test now distinguishes what we can
observe from what is only covered indirectly.
'pg_last_wal_receive_lsn()' reports 'flushedUpto', not 'writtenUpto',
so the test now describes that state accurately and covers
'writtenUpto' through the 'standby_write' result. This seems
appropriate to me since the two positions are seeded in the places and
conditions. Test for flush lsn should also help verify write lsn.

The fencepost tests were split by the actual frontier being tested.
'standby_replay' uses 'pg_last_wal_replay_lsn()', while
'standby_flush' uses 'pg_last_wal_receive_lsn()'. This avoids treating
a replay-derived LSN as if it were also the exact write/flush
boundary. I left 'standby_write' out of the exact fencepost helper
because its frontier is not SQL-visible once walreceiver is stopped.
The async wakeup case now starts the waiter while replay is still
paused, so it must actually sleep before replay and walreceiver are
allowed to advance.

The cascading timeline-switch test now checks the 'WAIT FOR ...
NO_THROW' status from background psql stdout. The previous log-marker
pattern could pass after unexpected returned status, includingn
'timeout', because the following statement would still run. The
'received_tli > 1' check remains, but only as confirmation that the
downstream followed the new timeline; the 'success' status proves the
wait completed as intended.

Please check it.

--
Best,
Xuneng

Attachment Content-Type Size
v5-0003-Remove-redundant-WAIT-FOR-LSN-caller-side-pre-che.patch application/octet-stream 5.2 KB
v5-0002-Fix-memory-ordering-in-WAIT-FOR-LSN-wakeup-mechan.patch application/octet-stream 4.3 KB
v5-0005-Wake-standby_write-standby_flush-waiters-from-the.patch application/octet-stream 5.9 KB
v5-0001-Use-barrier-semantics-when-reading-writing-writte.patch application/octet-stream 3.1 KB
v5-0004-Use-replay-position-as-floor-for-WAIT-FOR-LSN-sta.patch application/octet-stream 8.7 KB
v5-0006-Improve-WAIT-FOR-LSN-test-coverage.patch application/octet-stream 12.6 KB
v5-0007-Document-that-WAIT-FOR-LSN-is-timeline-blind.patch application/octet-stream 1.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-04-21 04:17:26 Re: Typo Fixes and Patch
Previous Message jian he 2026-04-21 03:57:59 Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column