Re: Implement waiting for wal lsn replay: reloaded

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2025-11-16 13:25:39
Message-ID: CABPTF7WXBWO8qjLOqB-JwcQV60Yh7tqAFF2QxGA-FM=e-gnY9A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alexander,

On Sat, Nov 15, 2025 at 6:29 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> Hi, Xuneng!
>
> On Fri, Nov 14, 2025 at 3:50 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > On Fri, Nov 14, 2025 at 4:32 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> > >
> > > On 11/5/25 10:51, Alexander Korotkov wrote:
> > > > Hi!
> > > >
> > > > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > >> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > >>> On 2025-Nov-03, Alexander Korotkov wrote:
> > > >>>
> > > >>>> I'd like to give this subject another chance for pg19. I'm going to
> > > >>>> push this if no objections.
> > > >>>
> > > >>> Sure. I don't understand why patches 0002 and 0003 are separate though.
> > > >>
> > > >> FWIW, I appreciate such splits. Even if the functionality isn't usable
> > > >> independently, it's still different type of code that's affected. And the
> > > >> patches are each big enough to make that worthwhile for easier review.
> > > >
> > > > Thank you for the feedback, pushed.
> > > >
> > >
> > > Hi,
> > >
> > > The new TAP test 049_wait_for_lsn.pl introduced by this commit, because
> > > it takes a long time - about 65 seconds on my laptop. That's about 25%
> > > of the whole src/test/recovery, more than any other test.
> > >
> > > And most of the time there's nothing happening - these are the two log
> > > messages showing the 60-second wait:
> > >
> > > 2025-11-13 21:12:39.949 CET checkpointer[562597] LOG: checkpoint
> > > complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s)
> > > added, 0 removed, 2 recycled; write=0.906 s, sync=0.001 s, total=0.907
> > > s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB,
> > > estimate=32768 kB; lsn=0/040000B8, redo lsn=0/04000060
> > >
> > > 2025-11-13 21:13:38.994 CET client backend[562727] 049_wait_for_lsn.pl
> > > ERROR: recovery is not in progress
> > >
> > > So there's a checkpoint, 60 seconds of nothing, and then a failure. I
> > > haven't looked into why it waits for 1 minute exactly, but adding 60
> > > seconds to check-world is somewhat annoying.
> >
> > Thanks for looking into this!
> >
> > I did a quick analysis for this prolonged waiting:
> >
> > In WaitLSNWakeup() (xlogwait.c:267), the fast-path check incorrectly
> > handled InvalidXLogRecPtr:
> > /* Fast path check */
> > if (pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
> > return; // Issue: Returns early when currentLSN = 0
> >
> > When currentLSN = InvalidXLogRecPtr (0), meaning "wake all waiters",
> > the check compared:
> > - minWaitedLSN (e.g., 0x570CC048) > 0 → TRUE
> > - Result: function returned early without waking anyone
> >
> > When It Happened
> > During standby promotion, xlog.c:6246 calls:
> >
> > WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, InvalidXLogRecPtr);
> >
> > This should wake all LSN waiters, but the bug prevented it. WAIT FOR
> > LSN commands could wait indefinitely. Test 049_wait_for_lsn.pl took 68
> > seconds instead of ~9 seconds.
> >
> > if the above analysis is sound, the fix could be like:
> >
> > Proposed fix:
> > Added a validity check before the comparison:
> > /*
> > * Fast path check. Skip if currentLSN is InvalidXLogRecPtr, which means
> > * "wake all waiters" (e.g., during promotion when recovery ends).
> > */
> > if (XLogRecPtrIsValid(currentLSN) &&
> > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
> > return;
> >
> > Result:
> > Test time: 68s → 9s
> > WAIT FOR LSN exits immediately on promotion (62ms vs 60s)
> >
> > > While at it, I noticed a couple comments refer to WaitForLSNReplay, but
> > > but I think that got renamed simply to WaitForLSN.
> >
> > Please check the attached patch for replacing them.
>
> Thank you so much for your patches!
> Pushed with minor corrections.

Thanks for pushing! It appears I should be running pgindent more regularly :).

--
Best,
Xuneng

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2025-11-16 13:30:08 Re: Implement waiting for wal lsn replay: reloaded
Previous Message Tender Wang 2025-11-16 12:41:26 Re: Use opresulttype instead of calling SearchSysCache1() in match_orclause_to_indexcol()