Re: Implement waiting for wal lsn replay: reloaded

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: 2026-04-09 15:21:24
Message-ID: CAPpHfduhQsm44j_ziZ6ykFTDZ2SFvZp04iCrc5_dD9PGZUrhrw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 8, 2026 at 7:59 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > > Patch 0001 looks OK for me.
> > > Regarding patch 0002. Changes made for GetCurrentLSNForWaitType()
> > > looks reliable for me. PerformWalRecovery() sets replayed positions
> > > before starting recovery, and in turn before standby can accept
> > > connections. So, changes to WalReceiverMain() don't look necessary to
> > > me.
> >
> > Yeah, GetCurrentLSNForWaitType seems to be the right place to place
> > the fix. Please see the attached patch 2.
> >
> > I also noticed another relevent problem:
> >
> > During pure archive recovery (no walreceiver), a backend that issues
> > 'WAIT FOR LSN ... MODE 'standby_write' with a target ahead of the
> > current replay position will sleep forever; the startup process
> > replays past the target but only wakes 'STANDBY_REPLAY' waiters.
> >
> > This also affects mixed scenarios: the walreceiver may lag behind
> > replay (e.g., archive restore has delivered WAL faster than
> > streaming), so a 'standby_write' waiter could be waiting on WAL that
> > replay has already consumed.
> >
> > I will write a patch to address this soon.
> >
>
> Here is the patch.

I've assembled all the pending patches together.
0001 adds memory barrier to GetWalRcvWriteRecPtr() as suggested by
Andres off-list.
0002 is basically [1] by Xuneng, but revised given we have a memory
barrier in 0001, and my proposal to do ResetLatch() unconditionally
similar to our other Latch-based loops.
0003 and 0004 are [2] by Xuneng.
0005 is [3] by Xuneng.

I'm going to add them to Commitfest to run CI over them, and have a
closer look over them tomorrow.

Links.
1. https://www.postgresql.org/message-id/CABPTF7Wjk_FbOghyr09Rzu6T2bh-L_KBMqHK%2BzhRXpssU0STyQ%40mail.gmail.com
2. https://www.postgresql.org/message-id/CABPTF7X0iV%3DkGC4gjsTj4NvK_NNEJGM3YTc7Obxs5GOiYoMhEw%40mail.gmail.com
3. https://www.postgresql.org/message-id/CABPTF7UBdEfyxATWntmCfoJrwB6iPrnhkXO7y_Avmqc2bOn27A%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v3-0002-Fix-memory-ordering-in-WAIT-FOR-LSN-wakeup-mechan.patch application/octet-stream 3.2 KB
v3-0001-Add-a-memory-barrier-to-GetWalRcvWriteRecPtr.patch application/octet-stream 1.8 KB
v3-0004-Use-replay-position-as-floor-for-WAIT-FOR-LSN-sta.patch application/octet-stream 8.7 KB
v3-0003-Remove-redundant-WAIT-FOR-LSN-caller-side-pre-che.patch application/octet-stream 5.2 KB
v3-0005-Wake-standby_write-standby_flush-waiters-from-the.patch application/octet-stream 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2026-04-09 15:23:59 Re: Add pg_stat_autovacuum_priority
Previous Message Nathan Bossart 2026-04-09 14:50:49 Re: pgstat vs aset