| From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
| Subject: | Re: Implement waiting for wal lsn replay: reloaded |
| Date: | 2026-04-07 12:52:03 |
| Message-ID: | CAPpHfds=B=ZNcfxPqFJ8ZVJB6surey+cP+8iDLfj5qM3Xvx=bg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Apr 7, 2026 at 7:03 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> On Tue, Apr 7, 2026 at 11:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2026-04-06 23:07:45 -0400, Andres Freund wrote:
> > > But, leaving that aside, looking at this code I'm somewhat concerned - it
> > > seems to not worry at all about memory ordering?
> > >
> > >
> > > static void
> > > XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, TimeLineID tli)
> > > ...
> > > /* Update shared-memory status */
> > > pg_atomic_write_u64(&WalRcv->writtenUpto, LogstreamResult.Write);
> > >
> > > /*
> > > * If we wrote an LSN that someone was waiting for, notify the waiters.
> > > */
> > > if (waitLSNState &&
> > > (LogstreamResult.Write >=
> > > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
> > > WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);
> > >
> > > There are no memory barriers here, so the CPU would be entirely free to not
> > > make the writtenUpto write visible to a waiter that's in the process of
> > > registering and is checking whether it needs to wait in WaitForLSN().
> > >
> > > And WaitForLSN()->GetCurrentLSNForWaitType()->GetWalRcvWriteRecPtr() also has
> > > no barriers. That MAYBE is ok, due addLSNWaiter() providing the barrier at
> > > loop entry and maybe kinda you can think that WaitLatch() will somehow also
> > > have barrier semantic. But if so, that would need to be very carefully
> > > documented. And it seems completely unnecessary here, it's hard to believe
> > > using a barrier (via pg_atomic_read_membarrier_u64() or such) would be a
> > > performance issue
>
> Thanks for pointing this out. This is indeed a store-load ordering issue.
>
> > And separately from the memory ordering, how can it make sense that there's
> > at least 5 copies of this
> >
> > if (waitLSNState &&
> > (LogstreamResult.Flush >=
> > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_FLUSH])))
> > WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_FLUSH, LogstreamResult.Flush);
> >
> > around? That needs to be encapsulated so that if you have a bug, like the
> > memory ordering problem I describe above, it can be fixed once, not in
> > multiple places.
>
> Yeah, this duplication is not ok.
>
> > And why do these callers even have that pre-check? Seems WaitLSNWakeup()
> > does so itself?
> >
> > /*
> > * Fast path check. Skip if currentLSN is InvalidXLogRecPtr, which means
> > * "wake all waiters" (e.g., during promotion when recovery ends).
> > */
> > if (XLogRecPtrIsValid(currentLSN) &&
> > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
> > return;
> >
> > And why is the code checking if waitLSNState is non-NULL?
> >
>
> These fast checks are unnecessary copy-pastos and waitLSNState checks
> also do not make sense except for the one in WaitLSNCleanup.
>
> I'll prepare a patch set addressing them.
Thanks to Tom and Andres for catching these issues!
I'm planning to work on this during this evening. I'll review
Xuneng's patches (or write my own if they wouldn't arrive yet).
------
Regards,
Alexander Korotkov
Supabase
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Xuneng Zhou | 2026-04-07 12:59:58 | Re: Implement waiting for wal lsn replay: reloaded |
| Previous Message | Sergei Patiakin | 2026-04-07 12:49:27 | Inconsistent trigger behavior between two temporal leftovers |