| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Andrey Silitskiy <a(dot)silitskiy(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ronan Dunklau <ronan(at)dunklau(dot)fr>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com> |
| Subject: | Re: Exit walsender before confirming remote flush in logical replication |
| Date: | 2026-04-08 08:38:32 |
| Message-ID: | 750545C3-A04C-4A62-9CF3-62AD91BD5104@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Apr 8, 2026, at 16:11, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Wed, Apr 8, 2026 at 4:05 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>> I have some CF entries failed on this test case as well, so I tried to look into the problem.
>
> Thanks for working on this, much appreciated!
>
>
>> Once entering WalSndDone(), it might call pg_flush() and get stuck:
>> ```
>> if (WalSndCaughtUp && sentPtr == replicatedPtr &&
>> !pq_is_send_pending())
>> {
>> QueryCompletion qc;
>>
>> /* Inform the standby that XLOG streaming is done */
>> SetQueryCompletion(&qc, CMDTAG_COPY, 0);
>> EndCommand(&qc, DestRemote, false);
>> pq_flush();
>>
>> proc_exit(0);
>> ```
>>
>> And once stuck, it will never get back to WalSndCheckShutdownTimeout(), so the new GUC timeout cannot rescue it.
>
> pq_flush() is called when WalSndCaughtUp && sentPtr == replicatedPtr
> && !pq_is_send_pending().
> Under these conditions, I was thinking that we can assume the kernel send
> buffer isn't full, so pq_flush() (i.e., the send() call) can copy the data
> without blocking and return immediately.
>
> I'm not very familiar with FreeBSD, but based on [1], I wonder if this
> assumption may not hold there, and pq_flush() could still block....
>
> Regards,
>
> [1] https://man.freebsd.org/cgi/man.cgi?unix(4)#BUFFERING
>
>> Due to the local nature of the Unix-domain sockets, they do not imple-
>> ment send buffers. The send(2) and write(2) families of system calls
>> attempt to write data to the receive buffer of the destination socket.
>
> --
> Fujii Masao
I don’t have a FreeBSD box to verify that directly. But the document you pointed out seems to state explicitly that, on Unix-domain sockets, writes go directly to the peer’s receive buffer. If so, the assumption that “the kernel send buffer isn’t full” no longer really holds on FreeBSD. From this perspective, changing to non-blocking pq_flush_if_writable() makes sense to me.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Imran Zaheer | 2026-04-08 08:46:04 | Re: [WIP] Pipelined Recovery |
| Previous Message | Amit Kapila | 2026-04-08 08:35:49 | Re: Adding REPACK [concurrently] |