Re: Exit walsender before confirming remote flush in logical replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrey Silitskiy <a(dot)silitskiy(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ronan Dunklau <ronan(at)dunklau(dot)fr>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Exit walsender before confirming remote flush in logical replication
Date: 2026-04-21 18:32:14
Message-ID: CAHGQGwHnSSg=2Kue-bBqQyER+GYkuADNN4OHUfp9XStOfv1LEw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 8, 2026 at 5:39 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> I don’t have a FreeBSD box to verify that directly. But the document you pointed out seems to state explicitly that, on Unix-domain sockets, writes go directly to the peer’s receive buffer. If so, the assumption that “the kernel send buffer isn’t full” no longer really holds on FreeBSD. From this perspective, changing to non-blocking pq_flush_if_writable() makes sense to me.

I was thinking about the effect of replacing that pq_flush() with
pq_flush_if_writable().

Under normal conditions, there should be no behavioral difference. In either
case, the end-of-streaming message is sent to the standby or subscriber.

The difference only appears if walsender cannot complete the send() call for
that message immediately (that is, it cannot append the message to the kernel
send buffer). This should be rare, because that pq_flush() call happens under
the condition "WalSndCaughtUp && sentPtr == replicatedPtr &&
!pq_is_send_pending()",
where the send buffer would normally not be full. However, as observed,
this can apparently happen on FreeBSD when using Unix-domain sockets.

Even if pq_flush_if_writable() fails to send the end-of-streaming message,
walreceiver and the logical apply worker seem to behave almost the same
in practice whether they receive it or not. The main differences are how they
detect closure of the replication connection and the resulting log messages.
If that analysis is correct, replacing pq_flush() with pq_flush_if_writable()
seems acceptable to me.

If the end-of-streaming message is received, walreceiver and the apply worker
log messages like the following, then try to send a reply, which fails because
the connection has already been closed:

[walreceiver] LOG: replication terminated by primary server
[walreceiver] DETAIL: End of WAL reached ...
[walreceiver] FATAL: could not send end-of-streaming message to
primary: server closed the connection unexpectedly

[apply worker] LOG: data stream from publisher has ended
[apply worker] ERROR: could not send end-of-streaming message to
primary: server closed the connection unexpectedly

If the message is not received, they simply detect the closed connection
while waiting for the next message:

[walreceiver] FATAL: could not receive data from WAL stream:
server closed the connection unexpectedly

[apply worker] ERROR: could not receive data from WAL stream:
server closed the connection unexpectedly

Therefore, since replacing pq_flush() with pq_flush_if_writable() seems to
change behavior only in a limited and acceptable way, I'm thinking to create
the patch doing that replacement.

BTW, though this is not directly related to this topic, I'm also wondering
whether walreceiver can ever successfully send an end-of-streaming message
back to walsender. It appears to attempt that only after the replication
connection has already been closed, which would seem to make it fail every time.

Regards,

--
Fujii Masao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2026-04-21 18:41:30 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message Alexander Lakhin 2026-04-21 18:00:00 Re: Incorrect checksum in control file with pg_rewind test