| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Date: | 2025-11-14 09:38:25 |
| Message-ID: | CAA4eK1KZ9=cBBZn08c_h5W37tnLwZasct6QsFAQpBB_xGGEACw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Nov 14, 2025 at 5:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Nov 13, 2025 at 1:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Nov 12, 2025 at 3:36 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I've attached the updated version patch. I addressed all comments I
> > > got so far, and made some cosmetic changes.
> > >
> >
> > Few comments:
> > 1. BTW, to accomplish what we do in
> > UpdateLogicalDecodingStatusEndOfRecovery(), can't we follow a logic
> > similar to what we do in EnsureLogicalDecodingEnabled()?
> >
> > Basically, enable xlog_logical_info and mark SharedRecoveryState =
> > RECOVERY_STATE_DONE under one spinlock. After the spinlock is
> > released, wait for other backends to reflect the xlog_logical_info via
> > WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO))
> > and then enable logical decoding by setting logical_decoding_enabled
> > and writing a new WAL record.
> >
> > Can't this avoid the special handling required in the patch for the
> > status_change_allowed flag? This flag seems to be primarily required
> > because recovery_done action and the change of wal_level action are
> > being performed separately during promotion. If it is feasible to
> > perform them together, then we don't need an additional state that
> > says logical decoding status_change_allowed.
> >
> > The reason I am trying to explore an alternative to the current way to
> > handle promotion is that it can eliminate one of the complex parts of
> > the patch. But, if that is not feasible then we can live with the
> > current approach.
>
> Thank you for the suggestion! How should it work in cases where we
> need to disable logical decoding at the end of recovery? We accept WAL
> writes as soon as we mark SharedRecoveryState = RECOVERY_STATE_DONE,
> so we need to disable logical decoding with a STATUS_CHANGE WAL write
> beforehand. And then we disable xlog_logical_info and update
> SharedRecoveryState under one spinlock. Probably we need to do these
> operations while status_change_inprogress being true to prevent
> concurrent logical decoding status updates, and clear the flag after
> recovery fully completes. While it seems to work, we need to verify
> the side-effects due to moving xlog_logical_info to XLogCtlData since
> every process needs to retrieve the flag at process startup time.
>
> After thinking of this idea more, I think we can simplify the idea
> more; we do all end-of-recovery operations (updating both
> xlog_logical_info and logical_decoding_eanbled, WAL writes, and
> synchronization) while status_change_inprogress being true, and after
> recovery fully completes we clear the inprogress flag. It's probably
> okay to perform these operations in any order since WAL writes are
> still not permitted, and we can prevent concurrent logical decoding
> status updates until we clear the inprogress flag, which happens after
> marking SharedRecoveryState = RECOVERY_STATE_DONE. I've drafted this
> idea. Please see the 0002 patch.
>
I can look at the patch but this inprogress flag is another part which
I wanted to avoid if possible again due to its additional complexity.
So, I came up with an alternative locking scheme to enable/disable
decoding. You can compare both the ideas and share your thoughts:
During promotion code path:
1. Acquire LogicalDecodingControlLock, then check and remember whether
we need to enable or disable the xlog_logical_info and
logical_decoding_enabled. Release LogicalDecodingControlLock.
2. Set xlog_logical_info and mark SharedRecoveryState =
RECOVERY_STATE_DONE under one spinlock.
3. After the spinlock and controlfile lock are released, wait for
other backends to reflect the xlog_logical_info via
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO)),
4. Acquire LogicalDecodingControlLock in X mode, then there are two
cases to deal with:
(a) As part of step-1, the decision was to enable logical decoding.
So, we first check if some backend has already enabled it by checking
logical_decoding_enabled, if so, then we don't need to do anything.
Otherwise, once again count_slots to ensure that the concurrent
backend hasn't removed them, and if there still exist any, then set
logical_decoding_enabled, write a new WAL record, and release the
LogicalDecodingControlLock.
(b) As part of step-1, the decision was to disable logical decoding.
So, we first check if some backend has already enabled it by checking
logical_decoding_enabled, if so, then we don't need to do anything.
Otherwise, set logical_decoding_enabled to false, write WAL record,
and release the LogicalDecodingControlLock.
For backends:
I think most of these steps should be sufficient for the backend,
except that we don't need part of step 4(a) (Otherwise, once again
count_slots to ensure that the concurrent backend hasn't removed
them,). This is because the disable couldn't disable when we are
trying to enable the decoding, as one slot would have been present,
though some form of assertion could be added.
And in case of step 4(b), we need to additionally check slots-count
again to decide further on disable as the concurrent session could be
in-progress to create a slot.
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2025-11-14 09:43:53 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Previous Message | shveta malik | 2025-11-14 09:25:20 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |