From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-08-29 18:15:42 |
Message-ID: | CAD21AoDjdeqwTHa5nL-3nfEnNA4SfrP4k0yR90kq68=JOLRWxg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Aug 29, 2025 at 5:31 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> > My understanding of where the synced slot starts to move was not
> > right; it starts from the remote slot's restart_lsn, which could be
> > far ahead from the STATUS_CHANGE record that the startup process is
> > applying but where logical decoding should be enabled. It doesn't
> > happen that the slotsync worker tries to decode non-logical WAL
> > records even if it advances the slot after the startup disabled
> > logical decoding.
>
> Let me confirm your point. If the situation, which the slot is dropped and then
> created while the startup process processing, happens, the WAL records would be
> aligned like below. Your point is that the restart_lsn of the created slot is
> beginning of (b) so that all records can be decoded, right?
>
> ```
> STATUS_CHANGE true
> RUNNING_XACTS // (a) - generated by the first slot
> ...
> STATUS_CHANGE false // due to the slot drop
> ...
> STATUS_CHANGE true // from here all records are decode-safe
> RUNNING_XACTS // (b) - generated by the second slot, restart_lsn can set here
> ```
Yes. If I understand it correctly, even when the startup is processing
the second STATUS_CHANGE record (i.e., disabling logical decoding),
the synced slot uses the corresponding remote slot's restart_lsn,
i.e., (b). I believe that if the standby has not received the
RUNNING_XACT(b) yet at that point, the slotsync worker skips to sync
the slot (see the check at the top of synchronize_one_slot()).
>
> > how efficiently to fix it. I've considered a simple idea that the
> > slotsync worker checks IsLogicalDecodingEnabled() before trying to
> > sync one logical slot. However, it doesn't solve the race condition;
> > the startup process can disable logical decoding right after the
> > slotsync passed the check, in which case users would see the logical
> > slot is created after logical decoding is disabled.
>
> So... even if we can add check in decoding functions, the startup process can
> disable the logical decoding after that, is it also right?
I think so. I think IsLogicalDecodingEnabled() check is a check
whether a process can start logical decoding, but doesn't cover
already running logical decoding processes. The slot invalidation
mechanism is responsible for that.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Sami Imseih | 2025-08-29 18:27:03 | Re: [BUG] temporary file usage report with extended protocol and unnamed portals |
Previous Message | Tom Lane | 2025-08-29 18:07:17 | Re: Assert single row returning SQL-standard functions |