Re: Implement waiting for wal lsn replay: reloaded

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2026-04-07 04:02:47
Message-ID: CABPTF7WPRVJGdDeuWo-=3csnDs5FGfUsg97xppPyzoj9fRAMeQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

On Tue, Apr 7, 2026 at 11:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2026-04-06 23:07:45 -0400, Andres Freund wrote:
> > But, leaving that aside, looking at this code I'm somewhat concerned - it
> > seems to not worry at all about memory ordering?
> >
> >
> > static void
> > XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, TimeLineID tli)
> > ...
> > /* Update shared-memory status */
> > pg_atomic_write_u64(&WalRcv->writtenUpto, LogstreamResult.Write);
> >
> > /*
> > * If we wrote an LSN that someone was waiting for, notify the waiters.
> > */
> > if (waitLSNState &&
> > (LogstreamResult.Write >=
> > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
> > WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);
> >
> > There are no memory barriers here, so the CPU would be entirely free to not
> > make the writtenUpto write visible to a waiter that's in the process of
> > registering and is checking whether it needs to wait in WaitForLSN().
> >
> > And WaitForLSN()->GetCurrentLSNForWaitType()->GetWalRcvWriteRecPtr() also has
> > no barriers. That MAYBE is ok, due addLSNWaiter() providing the barrier at
> > loop entry and maybe kinda you can think that WaitLatch() will somehow also
> > have barrier semantic. But if so, that would need to be very carefully
> > documented. And it seems completely unnecessary here, it's hard to believe
> > using a barrier (via pg_atomic_read_membarrier_u64() or such) would be a
> > performance issue

Thanks for pointing this out. This is indeed a store-load ordering issue.

> And separately from the memory ordering, how can it make sense that there's
> at least 5 copies of this
>
> if (waitLSNState &&
> (LogstreamResult.Flush >=
> pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_FLUSH])))
> WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_FLUSH, LogstreamResult.Flush);
>
> around? That needs to be encapsulated so that if you have a bug, like the
> memory ordering problem I describe above, it can be fixed once, not in
> multiple places.

Yeah, this duplication is not ok.

> And why do these callers even have that pre-check? Seems WaitLSNWakeup()
> does so itself?
>
> /*
> * Fast path check. Skip if currentLSN is InvalidXLogRecPtr, which means
> * "wake all waiters" (e.g., during promotion when recovery ends).
> */
> if (XLogRecPtrIsValid(currentLSN) &&
> pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
> return;
>
> And why is the code checking if waitLSNState is non-NULL?
>

These fast checks are unnecessary copy-pastos and waitLSNState checks
also do not make sense except for the one in WaitLSNCleanup.

I'll prepare a patch set addressing them.

--
Best,
Xuneng

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2026-04-07 04:03:52 Re: SQL:2011 Application Time Update & Delete
Previous Message jie wang 2026-04-07 03:59:14 Re: Return DSA area for hash table from GetNamedDSHash()