Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-10-07 11:38:50
Message-ID: CAJpy0uAQ9udtYv_wDFKYNeZe_5ST1OekOrsCQZO=cudizvDGaw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 7, 2025 at 9:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Oct 3, 2025 at 12:43 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Oct 1, 2025 at 10:48 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > The other point to consider is that during promotion after
> > > UpdateLogicalDecodingStatusEndOfRecovery(), we have multiple things
> > > that seems to be necessary to perform before backends are allowed to
> > > write. For example, refer to comments: "If any of the critical GUCs
> > > have changed, log them before we allow backends to write WAL.*/. I
> > > think the key thing is that before we set state DB_IN_PRODUCTION in
> > > ControlFile and mark SharedRecoverstate as RECOVERY_STATE_DONE,
> > > backends shouldn't be allowed to write WAL. If we want to take an
> > > exception for writing a WAL during slot_creation before the
> > > RECOVERY_STATE_DONE is set, we should analyze and explain in comments
> > > why it is okay to take this exception.
> >
> > Agreed.
> >
> > As the discussion is becoming more complex, let me summarize our
> > discussion about the delay_status_change flag and lazy behavior.
> >
>
> Thanks for the summary.
>
> > The delay_status_change flag was created to handle a specific timing
> > issue: there's a brief window where backend processes can
> > enable/disable logical decoding but cannot write the STATUS_CHANGE
> > record. This occurs because after the startup process updates the
> > logical decoding status (in
> > UpdateLogicalDecodingStatusEndOfRecovery()), backend processes cannot
> > write WAL records until the startup sets SharedRecoveryState to
> > RECOVERY_STATE_DONE. The idea is to delay any logical decoding status
> > changes during this window until WAL writing is permitted system-wide.
> > An alternative idea being discussed is to allow an exception for
> > STATUS_CHANGE records, letting them be written even during this
> > window. While this alternative is simpler and technically feasible, it
> > could be risky as it breaks the general rule that 'backends cannot
> > write WAL records until recovery completes.'
> >
> > When the process exits or raises an ERROR, the process needs to clean
> > up temporary and ephemeral slots, which might disable logical
> > decoding. This deactivation process may involve waiting - either for
> > concurrent activation/deactivation processes to finish or due to the
> > delay_status_flag (if implemented). However, waiting during user-level
> > cleanup (in before_shmem_exit callbacks) isn't ideal since the process
> > blocks all interrupts. To address this, we introduced lazy behavior,
> > which delegates the deactivation process to the checkpointer, allowing
> > it to disable logical decoding asynchronously. This way, the
> > deactivation during user-level cleanup only needs to disable logical
> > decoding in shared memory and send signals.
> >
> > While we've discussed that if we don't use the idea of the
> > delay_status_flag we don't need the lazy behavior either, I find that
> > we still need lazy behavior to handle waits during concurrent status
> > changes.

Do you mean waits due to 'status_change_inprogress' being true (due to
concurrent slot-operation) when temp-slot cleanup is happening during
process_exit?

> > Moreover, since we need lazy behavior anyway, the benefits of
> > implementing the exception-based approach seem limited.
> >
>
> Right, it would be ideal if we can follow the same idea (either in a
> lazy way or by introducing wait during drop of slot) to disable
> decoding in all cases. However, during user operations (say slot_drop
> or subscription drop), doing it lazily has a downside that it can take
> some time before we could change the effective_wal_level and disable
> decoding especially when checkpointer (or any other background process
> we choose to do this work) is already busy with other things. Ideally,
> it shouldn't be frequent to drop the last logical slot, so it should
> be okay to take such an exception to keep code simple.

+1. Even when the last slot is dropped frequently, indicating a system
with a high rate of slot creation and deletion; IMO lazily disabling
logical decoding can still be beneficial, as it can help avoiding
unnecessary transitions between logical decoding states.

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-10-07 11:43:17 Re: Improve pg_sync_replication_slots() to wait for primary to advance
Previous Message Oliver Ford 2025-10-07 11:38:48 Re: Questionable result from lead(0) IGNORE NULLS