| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
| Subject: | Re: Implement waiting for wal lsn replay: reloaded |
| Date: | 2026-04-08 03:23:54 |
| Message-ID: | CABPTF7X0iV=kGC4gjsTj4NvK_NNEJGM3YTc7Obxs5GOiYoMhEw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Apr 8, 2026 at 7:23 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> Hi, Xuneng!
>
> > Here is some analysis of the issue reported by Tom:
> >
> > 1) The problem
> >
> > WAIT FOR LSN with standby_write or standby_flush mode can block
> > indefinitely on an idle primary even when the target LSN is already
> > satisfied by WAL on disk.
> >
> > The walreceiver initializes its process-local LogstreamResult.Write
> > and LogstreamResult.Flush from GetXLogReplayRecPtr() at connect time,
> > reflecting all WAL already present on the standby (from a base backup,
> > archive restore, or prior streaming). The shared-memory positions used
> > by WAIT FOR LSN, however, are not seeded from this value:
> >
> > WalRcv->writtenUpto is zero-initialized by ShmemInitStruct and remains
> > 0 until XLogWalRcvWrite() processes incoming streaming data.
> > WalRcv->flushedUpto is initialized to the segment-aligned streaming
> > start point by RequestXLogStreaming(), which may be significantly
> > behind the replay position. It advances only when XLogWalRcvFlush()
> > processes new data — which itself requires LogstreamResult.Flush <
> > LogstreamResult.Write, a condition that never holds at startup since
> > both fields are initialized to the same value.
> >
> > When the primary is idle and sends no new WAL, both positions stay at
> > their initial stale values indefinitely.
> >
> > 2) The fix
> > Seed writtenUpto and flushedUpto from LogstreamResult immediately
> > after the walreceiver initializes those process-local fields, then
> > call WaitLSNWakeup() to wake any already-blocked waiters.
> >
> > This broadens the semantics of these fields. writtenUpto and
> > flushedUpto used to track only WAL written or flushed by the current
> > walreceiver session — WAL received from the primary since the most
> > recent connect. After this change, they are initialized to the replay
> > position, so they also cover WAL that was already on disk before
> > streaming began. This affects pg_stat_wal_receiver.written_lsn and
> > flushed_lsn, which will now report the replay position immediately at
> > walreceiver startup rather than 0 and the segment boundary
> > respectively. I am still considering whether this semantic change is
> > acceptable though it does shorten the runtime of the tap tests
> > reported by Tom in my test. Another approach is to modify the logic of
> > GetCurrentLSNForWaitType to cope with this special case and leave the
> > publisher side alone without changing the semantics. But this seems to
> > be more subtle.
>
> Patch 0001 looks OK for me.
> Regarding patch 0002. Changes made for GetCurrentLSNForWaitType()
> looks reliable for me. PerformWalRecovery() sets replayed positions
> before starting recovery, and in turn before standby can accept
> connections. So, changes to WalReceiverMain() don't look necessary to
> me.
Yeah, GetCurrentLSNForWaitType seems to be the right place to place
the fix. Please see the attached patch 2.
I also noticed another relevent problem:
During pure archive recovery (no walreceiver), a backend that issues
'WAIT FOR LSN ... MODE 'standby_write' with a target ahead of the
current replay position will sleep forever; the startup process
replays past the target but only wakes 'STANDBY_REPLAY' waiters.
This also affects mixed scenarios: the walreceiver may lag behind
replay (e.g., archive restore has delivered WAL faster than
streaming), so a 'standby_write' waiter could be waiting on WAL that
replay has already consumed.
I will write a patch to address this soon.
--
Best,
Xuneng
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Remove-redundant-WAIT-FOR-LSN-caller-side-pre-che.patch | application/x-patch | 5.0 KB |
| v1-0002-Use-replay-position-as-floor-for-WAIT-FOR-LSN-sta.patch | application/octet-stream | 8.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David G. Johnston | 2026-04-08 03:27:54 | Re: doc: Improve wal_level and effective_wal_level GUC around logical replication |
| Previous Message | David G. Johnston | 2026-04-08 03:10:45 | doc: Improve wal_level and effective_wal_level GUC around logical replication |