Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-11-17 19:32:21
Message-ID: CAD21AoD3kH95TpjAFXdZxcMGOU+cZYrSbcJmTzZ109F4ZOmReA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 16, 2025 at 9:51 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Nov 14, 2025 at 5:10 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Nov 14, 2025 at 3:12 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Nov 14, 2025 at 4:15 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Nov 14, 2025 at 1:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Nov 14, 2025 at 5:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > I can look at the patch but this inprogress flag is another part which
> > > > > I wanted to avoid if possible again due to its additional complexity.
> > > > > So, I came up with an alternative locking scheme to enable/disable
> > > > > decoding. You can compare both the ideas and share your thoughts:
> > > > >
> > > > > During promotion code path:
> > > > > 1. Acquire LogicalDecodingControlLock, then check and remember whether
> > > > > we need to enable or disable the xlog_logical_info and
> > > > > logical_decoding_enabled. Release LogicalDecodingControlLock.
> > > > > 2. Set xlog_logical_info and mark SharedRecoveryState =
> > > > > RECOVERY_STATE_DONE under one spinlock.
> > > > > 3. After the spinlock and controlfile lock are released, wait for
> > > > > other backends to reflect the xlog_logical_info via
> > > > > WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO)),
> > > > > 4. Acquire LogicalDecodingControlLock in X mode, then there are two
> > > > > cases to deal with:
> > > > > (a) As part of step-1, the decision was to enable logical decoding.
> > > > > So, we first check if some backend has already enabled it by checking
> > > > > logical_decoding_enabled, if so, then we don't need to do anything.
> > > > > Otherwise, once again count_slots to ensure that the concurrent
> > > > > backend hasn't removed them, and if there still exist any, then set
> > > > > logical_decoding_enabled, write a new WAL record, and release the
> > > > > LogicalDecodingControlLock.
> > > > > (b) As part of step-1, the decision was to disable logical decoding.
> > > > > So, we first check if some backend has already enabled it by checking
> > > > > logical_decoding_enabled, if so, then we don't need to do anything.
> > > > > Otherwise, set logical_decoding_enabled to false, write WAL record,
> > > > > and release the LogicalDecodingControlLock.
> > > >
> > > > I'm not sure it works in cases where we need to disable logical
> > > > decoding at the end-of-recovery. Suppose that the decision made in
> > > > step-1 was to disable logical decoding, it's possible that non-logical
> > > > WAL records are written as soon as step-3 finishes while the logical
> > > > decoding is still enabled. This is because the backend processes who
> > > > started after step-3 see xlog_logical_info = false. This ends up with
> > > > logical decoding decoding non-logical WAL records.
> > > >
> > >
> > > If the startup process decides to disable decoding, this means there
> > > doesn't exist any logical slot and wal_level is 'replica', right? If
> > > so, then when we create the first slot before decoding, we should try
> > > to first enable xlog_logical_info, if not already enabled, wait for
> > > all backends to reflect that state. So, that should be sufficient.
> >
> > Right. But I think its (cascaded) standby could have logical slots and
> > decode non-logical WAL records.
> >
>
> Okay, so it is possible because cascaded standby could decode
> non-logical WAL records written on promoted standby after we disable
> xlog_logical_info and before we could disable logical_decoding and
> write WAL for it. This is not possible in the current patch because it
> disables xlog_logical_info, logical_decoding, and writes WAL for the
> same before marking recovery state as recovery_done. So before any
> non-logical WAL in the system could be replicated to a cascaded
> standby node, the WAL for disabling the logical_decoding would be
> replicated, the replay of which will disable logical_decoding on
> cascaded standby node.

Exactly. When disabling logical decoding, I think we have to disable
logical decoding with the WAL while enabling xlog_logical_info, and
then disable xlog_logical_info.

> I have a few questions about this, is it okay to write a new WAL
> record (by startup process) before marking the recovery state as
> recovery_done? Are we doing that even without this patch at any other
> place?

Yes. As far as I know, the startup process writes XLOG_FPW_CHANGE,
XLOG_END_OF_RECOVERY (or CHECKPOINT record), and XLOG_PARAMETER_CHANGE
before marking the recovery state as DONE.

>
> In v26-0002-FIXUP-remove-status_change_allowed-flag, by using
> status_change_inprogress, we ensure that no backend is allowed to
> toggle the logical_wal/decoding status till startup process marks the
> recovery state as recovery_done. I am trying to think what problem
> this part of design prevents. I have considered the following
> scenarios:
>
> Scenario-1:
> 1. Startup process enables logical_wal and logical_decoding. Writes
> WAL record for it
> 2. Backend disables logical_decoding, writes WAL for it, and disables
> logical_wal.
> 3. Startup process sets recovery_done and allows wal_writes
>
> Say, instead of using status_change_inprogress to prevent doing
> step-2, if we had used recovery_in_progress kind of flag then how is
> it possible for backends to create any problem for the current node or
> cascaded standbys? I think the only way a problem can happen is if we
> write the WAL to disable_logical decoding after any backend could have
> written a non-logical WAL information record. Can that happen if we
> use the recovery_in_progress flag to prevent disable of logical_wal?
> If so, how?

The main idea of holding status_change_inprogress until the recovery
end is to prevent concurrent toggling the logical decoding status. In
your scenario, IIUC backends cannot write any WAL yet at step-2 since
it's allowed at step-3. It would end up with a FATAL error actually.
One alternative is to make processes call LocalSetXLogInsertAllowed()
so that they can write WAL even during recovery, but I don't use it as
I'm concerned that it could lead to other problems. On the other hand,
we cannot let the backend to disable logical_decoding and logical_wal
without WAL warite at step-2 because otherwise the cascaded standby
won't disable logical decoding.

> Scenario-2:
> 1. Startup process disables logical_wal and logical_decoding. Writes
> WAL record for it
> 2. Backend enables logical_wal, wait for other backends to reflect
> this state, enable logical_decoding and writes WAL for it.
> 3. Startup process sets recovery_done and allows wal_writes
>
> Here, I see that the new patch is already using recovery_in_progress.
> So, not sure if the latest patch has used status_change_inprogress
> during recovery to cover this scenario. Am I missing something?

Similarly, the backends cannot write WAL records at step-2.

I noticed that the FIXUP patch is not complete; the patch lets the
backend simply return without checking the status_change_inprogress
flag if RecoveryInProgress() returns true, but I think it's wrong. I
think we should have the backend check the flag even during recovery.
Otherwise, we would end up missing the logical decoding change
happening while the status_change_inprogress being true set by the
startup process.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-11-17 19:52:39 Re: Compile error on the aarch64 platform: Missing asm/hwcap.h
Previous Message Tom Lane 2025-11-17 19:08:32 Re: postgresql.conf.sample tab width