| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
| Subject: | Re: Implement waiting for wal lsn replay: reloaded |
| Date: | 2026-04-10 06:13:05 |
| Message-ID: | CABPTF7Ukk8iJF7TpnK2mFOaboNJgWL1csfXu4e3J4GT0o7x0GQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Apr 10, 2026 at 11:59 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> On Fri, Apr 10, 2026 at 12:18 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > On 2026-04-09 18:21:24 +0300, Alexander Korotkov wrote:
> > > I've assembled all the pending patches together.
> > > 0001 adds memory barrier to GetWalRcvWriteRecPtr() as suggested by
> > > Andres off-list.
> >
> > I'd make it a pg_atomic_read_membarrier_u64().
> >
> >
> > > 0002 is basically [1] by Xuneng, but revised given we have a memory
> > > barrier in 0001, and my proposal to do ResetLatch() unconditionally
> > > similar to our other Latch-based loops.
> > > 0003 and 0004 are [2] by Xuneng.
> > > 0005 is [3] by Xuneng.
> > >
> > > I'm going to add them to Commitfest to run CI over them, and have a
> > > closer look over them tomorrow.
> >
> > Briefly skimming the patches, none makes the writes to writtenUpto use
> > something bearing barrier semantics. I'd just make both of them a
> > pg_atomic_write_membarrier_u64().
> >
>
> Makes sense to me. Done.
>
> > I think this also needs a few more tests, e.g. for the scenario that
> > 29e7dbf5e4d fixed. I think it'd also be good to do some testing for
> > off-by-one dangers. E.g. making sure that we don't stop waiting too early /
> > too late. Another one that I think might deserve more testing is waits on the
> > standby while crossing timeline boundaries.
> >
>
> I'll prepare a new patch for more test harnessing.
>
> >
> > > From 0e5b4d1b9311a628a70218d89abf12308c9d782f Mon Sep 17 00:00:00 2001
> > > From: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
> > > Date: Thu, 9 Apr 2026 16:49:04 +0300
> > > Subject: [PATCH v3 1/5] Add a memory barrier to GetWalRcvWriteRecPtr()
> > >
> > > Add pg_memory_barrier() before reading writtenUpto so that callers see
> > > up-to-date shared memory state. This matches the barrier semantics that
> > > GetWalRcvFlushRecPtr() and other LSN-position functions get implicitly from
> > > their spinlock acquire/release, and in turn protects from bugs caused by
> > > expectations of similar barrier guarantees from different LSN-position functions.
> > >
> > > Reported-by: Andres Freund <andres(at)anarazel(dot)de>
> > > Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
> > > ---
> > > src/backend/replication/walreceiverfuncs.c | 12 ++++++++++--
> > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
> > > index bd5d47be964..0408ddff43e 100644
> > > --- a/src/backend/replication/walreceiverfuncs.c
> > > +++ b/src/backend/replication/walreceiverfuncs.c
> > > @@ -363,14 +363,22 @@ GetWalRcvFlushRecPtr(XLogRecPtr *latestChunkStart, TimeLineID *receiveTLI)
> > >
> > > /*
> > > * Returns the last+1 byte position that walreceiver has written.
> > > - * This returns a recently written value without taking a lock.
> > > + *
> > > + * Use a memory barrier to ensure that callers see up-to-date shared memory
> > > + * state, matching the barrier semantics provided by the spinlock in
> > > + * GetWalRcvFlushRecPtr() and other LSN-position functions.
> > > */
> > > XLogRecPtr
> > > GetWalRcvWriteRecPtr(void)
> > > {
> > > WalRcvData *walrcv = WalRcv;
> > > + XLogRecPtr recptr;
> > > +
> > > + pg_memory_barrier();
> > >
> > > - return pg_atomic_read_u64(&walrcv->writtenUpto);
> > > + recptr = pg_atomic_read_u64(&walrcv->writtenUpto);
> > > +
> > > + return recptr;
> > > }
> > >
> > > /*
> > > --
> > > 2.39.5 (Apple Git-154)
> > >
> >
> > > Subject: [PATCH v3 2/5] Fix memory ordering in WAIT FOR LSN wakeup mechanism
> >
> > > + /*
> > > + * Ensure the waker's prior position store (writtenUpto, flushedUpto,
> > > + * lastReplayedEndRecPtr, etc.) is globally visible before we read
> > > + * minWaitedLSN. Without this barrier, the CPU could load minWaitedLSN
> > > + * before draining the position store, leaving the position invisible to a
> > > + * concurrently-registering waiter.
> > > + *
> > > + * This is the waker side of a Dekker-style handshake; pairs with
> > > + * pg_memory_barrier() in GetCurrentLSNForWaitType() on the waiter side.
> > > + */
> > > + pg_memory_barrier();
> > > +
> > > /*
> > > * Fast path check. Skip if currentLSN is InvalidXLogRecPtr, which means
> > > * "wake all waiters" (e.g., during promotion when recovery ends).
> >
> > I'd also make this a pg_atomic_read_membarrier_u64() and the write a
> > pg_atomic_write_membarrier_u64(). It's a lot easier to reason about this
> > stuff if you make sure that the individual reads / write pair and have
> > ordering implied.
> >
>
> It does look more selft-contained to me.
>
> Here is the updated patch set based on Alexander’s earlier version.
The last para of commit message in patch 2 is inaccurate after the
getter-side barrier changes, "could read a stale position and wrongly
timeout" is no longer the primary rationale.
--
Best,
Xuneng
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0001-Use-barrier-semantics-when-reading-writing-writte.patch | application/octet-stream | 3.0 KB |
| v4-0005-Wake-standby_write-standby_flush-waiters-from-the.patch | application/octet-stream | 5.9 KB |
| v4-0004-Use-replay-position-as-floor-for-WAIT-FOR-LSN.patch | application/octet-stream | 8.7 KB |
| v4-0003-Remove-redundant-WAIT-FOR-LSN-caller-side-pre-che.patch | application/octet-stream | 5.2 KB |
| v4-0002-Fix-memory-ordering-in-WAIT-FOR-LSN-wakeup-mechan.patch | application/octet-stream | 4.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Fujii Masao | 2026-04-10 06:16:37 | Re: Use proc_exit() in WalRcvWaitForStartPosition |
| Previous Message | Michael Paquier | 2026-04-10 06:12:41 | Re: Fix pgstat_database.c to honor passed database OIDs |