| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add WALRCV_CONNECTING state to walreceiver |
| Date: | 2026-01-22 07:34:32 |
| Message-ID: | E13A8008-2E9A-4B2E-9A20-2D81719FA781@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Jan 22, 2026, at 12:37, Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> Thanks for the feedbacks.
>
> On Thu, Jan 22, 2026 at 8:20 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>
>> On Wed, Jan 21, 2026 at 08:40:16PM +0800, Xuneng Zhou wrote:
>>> Please see v5 of the updated patch.
>>
>> This seems acceptable here, cool.
>>
>> Something worth a note: I thought that the states of the WAL sender
>> were documented already, and I was wrong in thinking that it was the
>> case. Sorry. :)
>>
>> By the way, the list of state values you are specifying in the patch
>> is not complete. In theory, we allow all state values like "stopped"
>> to show up in a single function call of pg_stat_get_wal_receiver(),
>> but you are not documenting all of them.
>
> My concern for not adding it before is that this status could be
> invisible to users.
> Looking at pg_stat_get_wal_receiver():
>
> if (pid == 0 || !ready_to_display)
> PG_RETURN_NULL();
>
> The function returns NULL (no row) when pid == 0.
>
> When the WAL receiver is in WALRCV_STOPPED state (WalRcvDie), pid is set to 0:
> walrcv->walRcvState = WALRCV_STOPPED;
> walrcv->pid = 0;
>
> So the view returns no row rather than a row with status = 'stopped'.
> But for completeness, maybe we should add it.
>
>> Using a list (like with the
>> <itemizedlist> markup) would be better.
>
> The states are now wrapped in an itemizedlist.
>
>> The documentation improvement can be treated as a separate change,
>> worth its own before adding the new "connecting" state. Could you
>> split that into two patches please? That would make one patch for
>> the documentation changes with the list of state values that can be
>> reported, and a second patch for the new "connecting" state.
>
> The changes have been splitted into two patches as suggested.
>
> --
> Best,
> Xuneng
> <v6-0002-Add-WALRCV_CONNECTING-state-to-walreceiver.patch><v6-0001-Doc-document-all-pg_stat_wal_receiver-status-valu.patch>
Hi Xuneng,
I just reviewed the patch. It splits the current STREAMING state into CONNECTING and STREAMING, where CONNECTING represents the connection establishing phase, which makes sense. The patch is solid, I have just a few small comments:
1 - 0001
```
+ <para>
+ <literal>stopped</literal>: WAL receiver process is not running.
+ This state is normally not visible because the view returns no row
+ when the WAL receiver is not running.
+ </para>
```
“Is not running” appears twice in this short paragraph, looks redundant. Suggestion:
```
<literal>stopped</literal>: WAL receiver process is not running.
This state is normally not visible because the view returns no row in this case.
```
2 - 0002
```
+ WALRCV_CONNECTING, /* connecting to primary, replication not yet
+ * started */
```
I think “primary” is inaccurate. A connection destination could be a primary, or the other standby, so maybe change “primary” to “upstream server”.
3 - 0002
```
* Mark walreceiver as running in shared memory.
```
I think we should update this comment, changing “running” to “connecting”. There is not a “running” state, before this patch, “streaming” can roughly equal to “running”, after this patch, “running” is kinda unclear.
4 - 0002
```
+ /* Connection established, switch to streaming state */
+ SpinLockAcquire(&walrcv->mutex);
+ Assert(walrcv->walRcvState == WALRCV_CONNECTING);
+ walrcv->walRcvState = WALRCV_STREAMING;
+ SpinLockRelease(&walrcv->mutex);
```
I don’t feel good with this Assert(). Looking at ShutdownWalRcv(), the startup process might set the state to STOPPING, so there is a race.
Before this patch, this is no state change here, thus rest logic can handle STOPPING. After this patch, if the race occurs, in dev mode, the Assert is fired; in production mode, STOPPING is overwritten by STREAMING, which is wrong.
So, instead of Assert(), I think we should check if current state is CONNECTING, if not, it should not proceed.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zsolt Parragi | 2026-01-22 07:40:38 | Re: Periodic authorization expiration checks using GoAway message |
| Previous Message | wenhui qiu | 2026-01-22 06:58:29 | Re: Report oldest xmin source when autovacuum cannot remove tuples |