Re: Add WALRCV_CONNECTING state to walreceiver

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add WALRCV_CONNECTING state to walreceiver
Date: 2026-01-22 07:34:32
Message-ID: E13A8008-2E9A-4B2E-9A20-2D81719FA781@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jan 22, 2026, at 12:37, Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> Thanks for the feedbacks.
>
> On Thu, Jan 22, 2026 at 8:20 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>
>> On Wed, Jan 21, 2026 at 08:40:16PM +0800, Xuneng Zhou wrote:
>>> Please see v5 of the updated patch.
>>
>> This seems acceptable here, cool.
>>
>> Something worth a note: I thought that the states of the WAL sender
>> were documented already, and I was wrong in thinking that it was the
>> case. Sorry. :)
>>
>> By the way, the list of state values you are specifying in the patch
>> is not complete. In theory, we allow all state values like "stopped"
>> to show up in a single function call of pg_stat_get_wal_receiver(),
>> but you are not documenting all of them.
>
> My concern for not adding it before is that this status could be
> invisible to users.
> Looking at pg_stat_get_wal_receiver():
>
> if (pid == 0 || !ready_to_display)
> PG_RETURN_NULL();
>
> The function returns NULL (no row) when pid == 0.
>
> When the WAL receiver is in WALRCV_STOPPED state (WalRcvDie), pid is set to 0:
> walrcv->walRcvState = WALRCV_STOPPED;
> walrcv->pid = 0;
>
> So the view returns no row rather than a row with status = 'stopped'.
> But for completeness, maybe we should add it.
>
>> Using a list (like with the
>> <itemizedlist> markup) would be better.
>
> The states are now wrapped in an itemizedlist.
>
>> The documentation improvement can be treated as a separate change,
>> worth its own before adding the new "connecting" state. Could you
>> split that into two patches please? That would make one patch for
>> the documentation changes with the list of state values that can be
>> reported, and a second patch for the new "connecting" state.
>
> The changes have been splitted into two patches as suggested.
>
> --
> Best,
> Xuneng
> <v6-0002-Add-WALRCV_CONNECTING-state-to-walreceiver.patch><v6-0001-Doc-document-all-pg_stat_wal_receiver-status-valu.patch>

Hi Xuneng,

I just reviewed the patch. It splits the current STREAMING state into CONNECTING and STREAMING, where CONNECTING represents the connection establishing phase, which makes sense. The patch is solid, I have just a few small comments:

1 - 0001
```
+ <para>
+ <literal>stopped</literal>: WAL receiver process is not running.
+ This state is normally not visible because the view returns no row
+ when the WAL receiver is not running.
+ </para>
```

“Is not running” appears twice in this short paragraph, looks redundant. Suggestion:
```
<literal>stopped</literal>: WAL receiver process is not running.
This state is normally not visible because the view returns no row in this case.
```

2 - 0002
```
+ WALRCV_CONNECTING, /* connecting to primary, replication not yet
+ * started */
```

I think “primary” is inaccurate. A connection destination could be a primary, or the other standby, so maybe change “primary” to “upstream server”.

3 - 0002
```
* Mark walreceiver as running in shared memory.
```

I think we should update this comment, changing “running” to “connecting”. There is not a “running” state, before this patch, “streaming” can roughly equal to “running”, after this patch, “running” is kinda unclear.

4 - 0002
```
+ /* Connection established, switch to streaming state */
+ SpinLockAcquire(&walrcv->mutex);
+ Assert(walrcv->walRcvState == WALRCV_CONNECTING);
+ walrcv->walRcvState = WALRCV_STREAMING;
+ SpinLockRelease(&walrcv->mutex);
```

I don’t feel good with this Assert(). Looking at ShutdownWalRcv(), the startup process might set the state to STOPPING, so there is a race.

Before this patch, this is no state change here, thus rest logic can handle STOPPING. After this patch, if the race occurs, in dev mode, the Assert is fired; in production mode, STOPPING is overwritten by STREAMING, which is wrong.

So, instead of Assert(), I think we should check if current state is CONNECTING, if not, it should not proceed.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zsolt Parragi 2026-01-22 07:40:38 Re: Periodic authorization expiration checks using GoAway message
Previous Message wenhui qiu 2026-01-22 06:58:29 Re: Report oldest xmin source when autovacuum cannot remove tuples