| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Noah Misch <noah(at)leadboat(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz> |
| Subject: | Re: Add WALRCV_CONNECTING state to walreceiver |
| Date: | 2025-12-12 08:45:56 |
| Message-ID: | CABPTF7UhRKo5GUQ+EuT+pFhcUah8JLG-bpGZwdwK0FmwS04Kfg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Noah,
On Fri, Dec 12, 2025 at 1:05 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> On Fri, Dec 12, 2025 at 12:51:00PM +0800, Xuneng Zhou wrote:
> > Bug #19093 [1] reported that pg_stat_wal_receiver.status = 'streaming'
> > does not accurately reflect streaming health. In that discussion,
> > Noah noted that even before the reported regression, status =
> > 'streaming' was unreliable because walreceiver sets it during early
> > startup, before attempting a connection. He suggested:
> >
> > "Long-term, in master only, perhaps we should introduce another status
> > like 'connecting'. Perhaps enact the connecting->streaming status
> > transition just before tendering the first byte of streamed WAL to the
> > startup process. Alternatively, enact that transition when the startup
> > process accepts the
> > first streamed byte."
>
> > == Proposal ==
> >
> > Introduce WALRCV_CONNECTING as an intermediate state between STARTING
> > and STREAMING:
> >
> > - When walreceiver starts, it enters CONNECTING (instead of going
> > directly to STREAMING).
> > - The transition to STREAMING occurs in XLogWalRcvFlush(), inside the
> > existing spinlock-protected block that updates flushedUpto.
>
> I think this has the drawback that if the primary's WAL is incompatible,
> e.g. unacceptable timeline, the walreceiver will still briefly enter
> STREAMING. That could trick monitoring.
Thanks for pointing this out.
Waiting for applyPtr to advance
> would avoid the short-lived STREAMING. What's the feasibility of that?
I think this could work, but with complications. If replay latency is
high or replay is paused with pg_wal_replay_pause, the WalReceiver
would stay in the CONNECTING state longer than expected. Whether this
is ok depends on the definition of the 'connecting' state. For the
implementation, deciding where and when to check applyPtr against LSNs
like receiveStart is more difficult—the WalReceiver doesn't know when
applyPtr advances. While the WalReceiver can read applyPtr from shared
memory, it isn't automatically notified when that pointer advances.
This leads to latency between checking and replay if this is done in
the WalReceiver part unless we let the startup process set the state,
which would couple the two components. Am I missing something here?
--
Best,
Xuneng
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2025-12-12 08:46:52 | Re: Fix and improve allocation formulas |
| Previous Message | Bertrand Drouvot | 2025-12-12 08:43:56 | Re: Fix and improve allocation formulas |