Re: Implement waiting for wal lsn replay: reloaded

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2026-04-09 16:18:14
Message-ID: d2ogsp54if47w35kx4vu7o3nkfghpuzctwbaudzey5brxavomw@bgfs3p7twoib
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2026-04-09 18:21:24 +0300, Alexander Korotkov wrote:
> I've assembled all the pending patches together.
> 0001 adds memory barrier to GetWalRcvWriteRecPtr() as suggested by
> Andres off-list.

I'd make it a pg_atomic_read_membarrier_u64().

> 0002 is basically [1] by Xuneng, but revised given we have a memory
> barrier in 0001, and my proposal to do ResetLatch() unconditionally
> similar to our other Latch-based loops.
> 0003 and 0004 are [2] by Xuneng.
> 0005 is [3] by Xuneng.
>
> I'm going to add them to Commitfest to run CI over them, and have a
> closer look over them tomorrow.

Briefly skimming the patches, none makes the writes to writtenUpto use
something bearing barrier semantics. I'd just make both of them a
pg_atomic_write_membarrier_u64().

I think this also needs a few more tests, e.g. for the scenario that
29e7dbf5e4d fixed. I think it'd also be good to do some testing for
off-by-one dangers. E.g. making sure that we don't stop waiting too early /
too late. Another one that I think might deserve more testing is waits on the
standby while crossing timeline boundaries.

> From 0e5b4d1b9311a628a70218d89abf12308c9d782f Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
> Date: Thu, 9 Apr 2026 16:49:04 +0300
> Subject: [PATCH v3 1/5] Add a memory barrier to GetWalRcvWriteRecPtr()
>
> Add pg_memory_barrier() before reading writtenUpto so that callers see
> up-to-date shared memory state. This matches the barrier semantics that
> GetWalRcvFlushRecPtr() and other LSN-position functions get implicitly from
> their spinlock acquire/release, and in turn protects from bugs caused by
> expectations of similar barrier guarantees from different LSN-position functions.
>
> Reported-by: Andres Freund <andres(at)anarazel(dot)de>
> Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
> ---
> src/backend/replication/walreceiverfuncs.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
> index bd5d47be964..0408ddff43e 100644
> --- a/src/backend/replication/walreceiverfuncs.c
> +++ b/src/backend/replication/walreceiverfuncs.c
> @@ -363,14 +363,22 @@ GetWalRcvFlushRecPtr(XLogRecPtr *latestChunkStart, TimeLineID *receiveTLI)
>
> /*
> * Returns the last+1 byte position that walreceiver has written.
> - * This returns a recently written value without taking a lock.
> + *
> + * Use a memory barrier to ensure that callers see up-to-date shared memory
> + * state, matching the barrier semantics provided by the spinlock in
> + * GetWalRcvFlushRecPtr() and other LSN-position functions.
> */
> XLogRecPtr
> GetWalRcvWriteRecPtr(void)
> {
> WalRcvData *walrcv = WalRcv;
> + XLogRecPtr recptr;
> +
> + pg_memory_barrier();
>
> - return pg_atomic_read_u64(&walrcv->writtenUpto);
> + recptr = pg_atomic_read_u64(&walrcv->writtenUpto);
> +
> + return recptr;
> }
>
> /*
> --
> 2.39.5 (Apple Git-154)
>

> Subject: [PATCH v3 2/5] Fix memory ordering in WAIT FOR LSN wakeup mechanism

> + /*
> + * Ensure the waker's prior position store (writtenUpto, flushedUpto,
> + * lastReplayedEndRecPtr, etc.) is globally visible before we read
> + * minWaitedLSN. Without this barrier, the CPU could load minWaitedLSN
> + * before draining the position store, leaving the position invisible to a
> + * concurrently-registering waiter.
> + *
> + * This is the waker side of a Dekker-style handshake; pairs with
> + * pg_memory_barrier() in GetCurrentLSNForWaitType() on the waiter side.
> + */
> + pg_memory_barrier();
> +
> /*
> * Fast path check. Skip if currentLSN is InvalidXLogRecPtr, which means
> * "wake all waiters" (e.g., during promotion when recovery ends).

I'd also make this a pg_atomic_read_membarrier_u64() and the write a
pg_atomic_write_membarrier_u64(). It's a lot easier to reason about this
stuff if you make sure that the individual reads / write pair and have
ordering implied.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2026-04-09 16:34:19 Re: Add pg_stat_autovacuum_priority
Previous Message Andres Freund 2026-04-09 16:02:28 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?