Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-08-29 18:15:42
Message-ID: CAD21AoDjdeqwTHa5nL-3nfEnNA4SfrP4k0yR90kq68=JOLRWxg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 29, 2025 at 5:31 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> > My understanding of where the synced slot starts to move was not
> > right; it starts from the remote slot's restart_lsn, which could be
> > far ahead from the STATUS_CHANGE record that the startup process is
> > applying but where logical decoding should be enabled. It doesn't
> > happen that the slotsync worker tries to decode non-logical WAL
> > records even if it advances the slot after the startup disabled
> > logical decoding.
>
> Let me confirm your point. If the situation, which the slot is dropped and then
> created while the startup process processing, happens, the WAL records would be
> aligned like below. Your point is that the restart_lsn of the created slot is
> beginning of (b) so that all records can be decoded, right?
>
> ```
> STATUS_CHANGE true
> RUNNING_XACTS // (a) - generated by the first slot
> ...
> STATUS_CHANGE false // due to the slot drop
> ...
> STATUS_CHANGE true // from here all records are decode-safe
> RUNNING_XACTS // (b) - generated by the second slot, restart_lsn can set here
> ```

Yes. If I understand it correctly, even when the startup is processing
the second STATUS_CHANGE record (i.e., disabling logical decoding),
the synced slot uses the corresponding remote slot's restart_lsn,
i.e., (b). I believe that if the standby has not received the
RUNNING_XACT(b) yet at that point, the slotsync worker skips to sync
the slot (see the check at the top of synchronize_one_slot()).

>
> > how efficiently to fix it. I've considered a simple idea that the
> > slotsync worker checks IsLogicalDecodingEnabled() before trying to
> > sync one logical slot. However, it doesn't solve the race condition;
> > the startup process can disable logical decoding right after the
> > slotsync passed the check, in which case users would see the logical
> > slot is created after logical decoding is disabled.
>
> So... even if we can add check in decoding functions, the startup process can
> disable the logical decoding after that, is it also right?

I think so. I think IsLogicalDecodingEnabled() check is a check
whether a process can start logical decoding, but doesn't cover
already running logical decoding processes. The slot invalidation
mechanism is responsible for that.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-08-29 18:27:03 Re: [BUG] temporary file usage report with extended protocol and unnamed portals
Previous Message Tom Lane 2025-08-29 18:07:17 Re: Assert single row returning SQL-standard functions