Re: Exit walsender before confirming remote flush in logical replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andrey Silitskiy <a(dot)silitskiy(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Exit walsender before confirming remote flush in logical replication
Date: 2026-02-02 11:58:00
Message-ID: CAHGQGwG4_FXFsGUjeF3koq9qfudA1oj-rkyO8r8+yc4f_OWh7Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 2, 2026 at 4:56 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Mon, Feb 02, 2026 at 02:52:36AM +0700, Andrey Silitskiy wrote:
> > +/*
> > + * Shutdown walsender in immediate mode.
> > + */
> > +static void
> > +WalSndDoneImmediate()
> > +{
> > + QueryCompletion qc;
> > +
> > + /* Try to inform receiver that XLOG streaming is done */
> > + SetQueryCompletion(&qc, CMDTAG_COPY, 0);
> > + EndCommand(&qc, DestRemote, false);
>
> Couldn't that be potentially dangerous, particularly if
> wal_sender_shutdown_mode is set to immediate, meaning that it applies
> to all the WAL senders? The WAL receiver side could be on a backend
> with an older backend version than the WAL sender where this new GUC
> exists. Hence, a completion could be understood incorrectly by a WAL
> receiver depending on how the receiving side is coded? Assuming this
> is merged into v19 in this shape, a receiver connecting to a newer
> server would get a new bevahior compared to older versions, without
> the receiver being aware of that.
>
> I suspect that this option, as designed, is potentially a footgun that
> could surprise many users, especially as it is super aggressive in
> shutting down everything on sight, unconditionally.

I’m not sure this is actually problematic when terminating a walsender in
immediate mode while the subscriber is running an older PostgreSQL version.
I haven't been able to come up with any failure scenario yet. However,
if this does turn out to be an issue, it might be better to use the alternative
approach I mentioned in [1].

With that approach, even in immediate mode the walsender simply receives
SIGTERM and exits. This is not new behavior for walsenders, so even when
the subscriber is on an older version, seems it shouldn't risk breaking
the shutdown behavior that such a subscriber is expecting.

Regards,

[1] https://postgr.es/m/CAHGQGwGotoS0VeMDdK6ezkhvdQpWZ5oJvO3QKJKEV6Pc+rZ_9A@mail.gmail.com

--
Fujii Masao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-02-02 12:15:27 Re: logical apply worker's lock waits in subscriber can stall checkpointer in publisher
Previous Message Álvaro Herrera 2026-02-02 11:50:50 Re: splitting pg_resetwal output strings