Re: Add WALRCV_CONNECTING state to walreceiver

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Add WALRCV_CONNECTING state to walreceiver
Date: 2025-12-14 04:45:46
Message-ID: CABPTF7UkUUxy6z8a2fcOkkxG=OgG1Ae0fJxnr7syz3wX5KjO6g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Dec 12, 2025 at 9:52 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Fri, Dec 12, 2025 at 4:45 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi Noah,
> >
> > On Fri, Dec 12, 2025 at 1:05 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > >
> > > On Fri, Dec 12, 2025 at 12:51:00PM +0800, Xuneng Zhou wrote:
> > > > Bug #19093 [1] reported that pg_stat_wal_receiver.status = 'streaming'
> > > > does not accurately reflect streaming health. In that discussion,
> > > > Noah noted that even before the reported regression, status =
> > > > 'streaming' was unreliable because walreceiver sets it during early
> > > > startup, before attempting a connection. He suggested:
> > > >
> > > > "Long-term, in master only, perhaps we should introduce another status
> > > > like 'connecting'. Perhaps enact the connecting->streaming status
> > > > transition just before tendering the first byte of streamed WAL to the
> > > > startup process. Alternatively, enact that transition when the startup
> > > > process accepts the
> > > > first streamed byte."
> > >
> > > > == Proposal ==
> > > >
> > > > Introduce WALRCV_CONNECTING as an intermediate state between STARTING
> > > > and STREAMING:
> > > >
> > > > - When walreceiver starts, it enters CONNECTING (instead of going
> > > > directly to STREAMING).
> > > > - The transition to STREAMING occurs in XLogWalRcvFlush(), inside the
> > > > existing spinlock-protected block that updates flushedUpto.
> > >
> > > I think this has the drawback that if the primary's WAL is incompatible,
> > > e.g. unacceptable timeline, the walreceiver will still briefly enter
> > > STREAMING. That could trick monitoring.
> >
> > Thanks for pointing this out.
> >
> > Waiting for applyPtr to advance
> > > would avoid the short-lived STREAMING. What's the feasibility of that?
> >
> > I think this could work, but with complications. If replay latency is
> > high or replay is paused with pg_wal_replay_pause, the WalReceiver
> > would stay in the CONNECTING state longer than expected. Whether this
> > is ok depends on the definition of the 'connecting' state. For the
> > implementation, deciding where and when to check applyPtr against LSNs
> > like receiveStart is more difficult—the WalReceiver doesn't know when
> > applyPtr advances. While the WalReceiver can read applyPtr from shared
> > memory, it isn't automatically notified when that pointer advances.
> > This leads to latency between checking and replay if this is done in
> > the WalReceiver part unless we let the startup process set the state,
> > which would couple the two components. Am I missing something here?
> >
>
> After some thoughts, a potential approach could be to expose a new
> function in the WAL receiver that transitions the state from
> CONNECTING to STREAMING. This function can then be invoked directly
> from WaitForWALToBecomeAvailable in the startup process, ensuring the
> state change aligns with the actual acceptance of the WAL stream.
>

V2 makes the transition from WALRCV_CONNECTING to STREAMING only when
the first valid WAL record is processed by the startup process. A new
function WalRcvSetStreaming is introduced to enable the transition.

--
Best,
Xuneng

Attachment Content-Type Size
v2-0001-Add-WALRCV_CONNECTING-state-to-walreceiver.patch application/octet-stream 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-12-14 05:09:06 Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain
Previous Message Michael Paquier 2025-12-14 02:16:44 Re: Fix documentation from recent test_custom_stats commit