| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
| Subject: | Re: Implement waiting for wal lsn replay: reloaded |
| Date: | 2026-04-07 13:05:40 |
| Message-ID: | CABPTF7VRP97GPwPTiu89xYQMA5pfWknsLSSxnpq11mu+-FiDRA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Alexander,
On Tue, Apr 7, 2026 at 8:52 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Tue, Apr 7, 2026 at 7:03 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > On Tue, Apr 7, 2026 at 11:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > Hi,
> > >
> > > On 2026-04-06 23:07:45 -0400, Andres Freund wrote:
> > > > But, leaving that aside, looking at this code I'm somewhat concerned - it
> > > > seems to not worry at all about memory ordering?
> > > >
> > > >
> > > > static void
> > > > XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, TimeLineID tli)
> > > > ...
> > > > /* Update shared-memory status */
> > > > pg_atomic_write_u64(&WalRcv->writtenUpto, LogstreamResult.Write);
> > > >
> > > > /*
> > > > * If we wrote an LSN that someone was waiting for, notify the waiters.
> > > > */
> > > > if (waitLSNState &&
> > > > (LogstreamResult.Write >=
> > > > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
> > > > WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);
> > > >
> > > > There are no memory barriers here, so the CPU would be entirely free to not
> > > > make the writtenUpto write visible to a waiter that's in the process of
> > > > registering and is checking whether it needs to wait in WaitForLSN().
> > > >
> > > > And WaitForLSN()->GetCurrentLSNForWaitType()->GetWalRcvWriteRecPtr() also has
> > > > no barriers. That MAYBE is ok, due addLSNWaiter() providing the barrier at
> > > > loop entry and maybe kinda you can think that WaitLatch() will somehow also
> > > > have barrier semantic. But if so, that would need to be very carefully
> > > > documented. And it seems completely unnecessary here, it's hard to believe
> > > > using a barrier (via pg_atomic_read_membarrier_u64() or such) would be a
> > > > performance issue
> >
> > Thanks for pointing this out. This is indeed a store-load ordering issue.
> >
> > > And separately from the memory ordering, how can it make sense that there's
> > > at least 5 copies of this
> > >
> > > if (waitLSNState &&
> > > (LogstreamResult.Flush >=
> > > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_FLUSH])))
> > > WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_FLUSH, LogstreamResult.Flush);
> > >
> > > around? That needs to be encapsulated so that if you have a bug, like the
> > > memory ordering problem I describe above, it can be fixed once, not in
> > > multiple places.
> >
> > Yeah, this duplication is not ok.
> >
> > > And why do these callers even have that pre-check? Seems WaitLSNWakeup()
> > > does so itself?
> > >
> > > /*
> > > * Fast path check. Skip if currentLSN is InvalidXLogRecPtr, which means
> > > * "wake all waiters" (e.g., during promotion when recovery ends).
> > > */
> > > if (XLogRecPtrIsValid(currentLSN) &&
> > > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
> > > return;
> > >
> > > And why is the code checking if waitLSNState is non-NULL?
> > >
> >
> > These fast checks are unnecessary copy-pastos and waitLSNState checks
> > also do not make sense except for the one in WaitLSNCleanup.
> >
> > I'll prepare a patch set addressing them.
>
> Thanks to Tom and Andres for catching these issues!
> I'm planning to work on this during this evening. I'll review
> Xuneng's patches (or write my own if they wouldn't arrive yet).
>
I’ve posted two patches. The first fixes the duplication issue
reported by Andres and is fairly straightforward. The second turned
out to be more complex than expected, and I’m still working through
possible solutions. Feedback or alternative approaches would be very
helpful.
I also spent some time drafting a patch to address the memory ordering
issue and will post it later.
--
Best,
Xuneng
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2026-04-07 13:07:45 | Re: pg_buffercache: Add per-relation summary stats |
| Previous Message | Evan Montgomery-Recht | 2026-04-07 12:59:59 | Re: Eliminating SPI / SQL from some RI triggers - take 3 |