Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-10-08 22:43:53
Message-ID: CAD21AoCo=00_=qCKw4HO0i0=ZDx8oyEpmfLkr6exFFLHFzAxzA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 7, 2025 at 4:39 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Oct 7, 2025 at 9:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 3, 2025 at 12:43 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Oct 1, 2025 at 10:48 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > The other point to consider is that during promotion after
> > > > UpdateLogicalDecodingStatusEndOfRecovery(), we have multiple things
> > > > that seems to be necessary to perform before backends are allowed to
> > > > write. For example, refer to comments: "If any of the critical GUCs
> > > > have changed, log them before we allow backends to write WAL.*/. I
> > > > think the key thing is that before we set state DB_IN_PRODUCTION in
> > > > ControlFile and mark SharedRecoverstate as RECOVERY_STATE_DONE,
> > > > backends shouldn't be allowed to write WAL. If we want to take an
> > > > exception for writing a WAL during slot_creation before the
> > > > RECOVERY_STATE_DONE is set, we should analyze and explain in comments
> > > > why it is okay to take this exception.
> > >
> > > Agreed.
> > >
> > > As the discussion is becoming more complex, let me summarize our
> > > discussion about the delay_status_change flag and lazy behavior.
> > >
> >
> > Thanks for the summary.
> >
> > > The delay_status_change flag was created to handle a specific timing
> > > issue: there's a brief window where backend processes can
> > > enable/disable logical decoding but cannot write the STATUS_CHANGE
> > > record. This occurs because after the startup process updates the
> > > logical decoding status (in
> > > UpdateLogicalDecodingStatusEndOfRecovery()), backend processes cannot
> > > write WAL records until the startup sets SharedRecoveryState to
> > > RECOVERY_STATE_DONE. The idea is to delay any logical decoding status
> > > changes during this window until WAL writing is permitted system-wide.
> > > An alternative idea being discussed is to allow an exception for
> > > STATUS_CHANGE records, letting them be written even during this
> > > window. While this alternative is simpler and technically feasible, it
> > > could be risky as it breaks the general rule that 'backends cannot
> > > write WAL records until recovery completes.'
> > >
> > > When the process exits or raises an ERROR, the process needs to clean
> > > up temporary and ephemeral slots, which might disable logical
> > > decoding. This deactivation process may involve waiting - either for
> > > concurrent activation/deactivation processes to finish or due to the
> > > delay_status_flag (if implemented). However, waiting during user-level
> > > cleanup (in before_shmem_exit callbacks) isn't ideal since the process
> > > blocks all interrupts. To address this, we introduced lazy behavior,
> > > which delegates the deactivation process to the checkpointer, allowing
> > > it to disable logical decoding asynchronously. This way, the
> > > deactivation during user-level cleanup only needs to disable logical
> > > decoding in shared memory and send signals.
> > >
> > > While we've discussed that if we don't use the idea of the
> > > delay_status_flag we don't need the lazy behavior either, I find that
> > > we still need lazy behavior to handle waits during concurrent status
> > > changes.
>
> Do you mean waits due to 'status_change_inprogress' being true (due to
> concurrent slot-operation) when temp-slot cleanup is happening during
> process_exit?

Yes.

>
> > > Moreover, since we need lazy behavior anyway, the benefits of
> > > implementing the exception-based approach seem limited.
> > >
> >
> > Right, it would be ideal if we can follow the same idea (either in a
> > lazy way or by introducing wait during drop of slot) to disable
> > decoding in all cases. However, during user operations (say slot_drop
> > or subscription drop), doing it lazily has a downside that it can take
> > some time before we could change the effective_wal_level and disable
> > decoding especially when checkpointer (or any other background process
> > we choose to do this work) is already busy with other things. Ideally,
> > it shouldn't be frequent to drop the last logical slot, so it should
> > be okay to take such an exception to keep code simple.
>
> +1. Even when the last slot is dropped frequently, indicating a system
> with a high rate of slot creation and deletion; IMO lazily disabling
> logical decoding can still be beneficial, as it can help avoiding
> unnecessary transitions between logical decoding states.
>

Thank you for the comments.

I've attached the updated patch. As we discussed, it now uses the lazy
deactivation in all cases and has the discussed points in the comments
atop logicalctl.c.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v16-0001-Enable-logical-decoding-dynamically-based-on-log.patch application/octet-stream 98.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-10-08 22:54:25 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message Peter Smith 2025-10-08 22:42:10 Re: pg_createsubscriber --dry-run logging concerns