| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add WALRCV_CONNECTING state to walreceiver |
| Date: | 2026-01-22 10:02:54 |
| Message-ID: | CABPTF7V2GFKwzeHSFKLQpdu9bU1qBh30X1rxgygSKdVhZnwTcw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Chao,
Thanks for looking into this.
>
> Hi Xuneng,
>
> I just reviewed the patch. It splits the current STREAMING state into CONNECTING and STREAMING, where CONNECTING represents the connection establishing phase, which makes sense. The patch is solid, I have just a few small comments:
>
> 1 - 0001
> ```
> + <para>
> + <literal>stopped</literal>: WAL receiver process is not running.
> + This state is normally not visible because the view returns no row
> + when the WAL receiver is not running.
> + </para>
> ```
>
> “Is not running” appears twice in this short paragraph, looks redundant. Suggestion:
> ```
> <literal>stopped</literal>: WAL receiver process is not running.
> This state is normally not visible because the view returns no row in this case.
> ```
>
> 2 - 0002
> ```
> + WALRCV_CONNECTING, /* connecting to primary, replication not yet
> + * started */
> ```
>
> I think “primary” is inaccurate. A connection destination could be a primary, or the other standby, so maybe change “primary” to “upstream server”.
Yeah, "upstream server" is more precise. There are multiple
user-facing log messages in this file that also use “primary”. I’m
wondering whether we should update those as well.
>
>
> 3 - 0002
> ```
> * Mark walreceiver as running in shared memory.
> ```
>
> I think we should update this comment, changing “running” to “connecting”. There is not a “running” state, before this patch, “streaming” can roughly equal to “running”, after this patch, “running” is kinda unclear.
>
It has been changed.
> 4 - 0002
> ```
> + /* Connection established, switch to streaming state */
> + SpinLockAcquire(&walrcv->mutex);
> + Assert(walrcv->walRcvState == WALRCV_CONNECTING);
> + walrcv->walRcvState = WALRCV_STREAMING;
> + SpinLockRelease(&walrcv->mutex);
> ```
>
> I don’t feel good with this Assert(). Looking at ShutdownWalRcv(), the startup process might set the state to STOPPING, so there is a race.
>
> Before this patch, this is no state change here, thus rest logic can handle STOPPING. After this patch, if the race occurs, in dev mode, the Assert is fired; in production mode, STOPPING is overwritten by STREAMING, which is wrong.
>
> So, instead of Assert(), I think we should check if current state is CONNECTING, if not, it should not proceed.
It makes sense to me. Please check the updated patch.
--
Best,
Xuneng
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0001-Add-WALRCV_CONNECTING-state-to-walreceiver.patch | application/octet-stream | 6.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Xuneng Zhou | 2026-01-22 10:06:21 | Re: Add WALRCV_CONNECTING state to walreceiver |
| Previous Message | Bertrand Drouvot | 2026-01-22 09:52:41 | Re: Flush some statistics within running transactions |