| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add WALRCV_CONNECTING state to walreceiver |
| Date: | 2026-01-22 23:08:02 |
| Message-ID: | CA52AB8E-2985-4003-8BAC-56C9EE2B52E4@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Jan 23, 2026, at 06:54, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Jan 22, 2026 at 06:02:54PM +0800, Xuneng Zhou wrote:
>>> Before this patch, this is no state change here, thus rest logic
>>> can handle STOPPING. After this patch, if the race occurs, in dev
>>> mode, the Assert is fired; in production mode, STOPPING is overwritten
>>> by STREAMING, which is wrong.
>>>
>>> So, instead of Assert(), I think we should check if current state
>>> is CONNECTING, if not, it should not proceed.
>>
>> It makes sense to me. Please check the updated patch.
>
> + if (state != WALRCV_CONNECTING)
> + {
> + if (state == WALRCV_STOPPING)
> + proc_exit(0);
> + elog(FATAL, "unexpected walreceiver state");
> + }
>
> Sorry, but this addition does not make sense to me. Before this
> patch, if the startup process decides to mark a WAL receiver as
> stopping after it has been started in WalReceiverMain(), then we would
> run at least one loop until we reach WalRcvWaitForStartPosition(),
> which is the path where the WAL receiver exits. Aka, I don't think we
> should increase the number of paths where we decide if a WAL receiver
> should fast-exit or not (I'd rather try to reduce them, but I don't
> think we can do that currently based on the ping-pong game between the
> startup process and WAL receiver process). Saying that, you are right
> about the fact that the assertion is wrong, and that we should not
> upgrade the status to STREAMING if the WAL receiver is not CONNECTING.
>
> So it seems to me that this code should remain as simple as followed,
> documenting that we switch to STREAMING when the first connection has
> been established after the WAL receiver has started, or when the WAL
> receiver is switched after WalRcvWaitForStartPosition() once
> startstreaming() has acknowledged that streaming is happening (I would
> add a comment saying that):
> if (state == WALRCV_CONNECTING)
> walrcv->walRcvState = WALRCV_STREAMING;
> --
> Michael
+1.
If current state is not CONNECTING, then leave it as is, which will then follow the original logic. Unless we find some issue in future, I think we can take this approach for now.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-01-22 23:10:08 | Re: Fix rounding method used to compute huge pages |
| Previous Message | Michael Paquier | 2026-01-22 23:04:29 | Re: Fix rounding method used to compute huge pages |