Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-09-26 09:41:00
Message-ID: CAExHW5uuSK8FzwF_kb9rMLWdzjCWGVHmxpmfY_-o1GxqzzbcXQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 26, 2025 at 11:13 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>

Replying to this message to avoid forking the thread.

Some more comments:
+/*
+ * This function does several kinds of preparation works required to start
+ * the process of logical decoding status change. If the status change is
+ * required, it ensures we can change logical decoding status, setting
+ * LogicalDecodingCtl->status_change_inprogress on, and returns true.
+ * Otherwise, if it's not required or not allowed (e.g., during recovery
+ * or wal_level = 'logical'), it returns false.
+ */
+static bool
+start_logical_decoding_status_change(bool new_status)
+{
+ Assert(!RecoveryInProgress());

The prologue mentions recovery as a case when we can't change status.
Why is Assert here then?

+
+ /* Prepare and start the activation process if it's disabled */
+ if (!start_logical_decoding_status_change(true))

The prologue of start_logical_decoding_status_change() mentions that
the function returns false if status change is not allowed. If the
function returns false because the status change is not allowed, we
are simply returning from EnsureLogicalDecodingEnabled(). The caller
will assume that logical decoding is enabled. Doesn't it look like a
missing case. May be the code is correct but comments aren't
explaining the situation well.

If a new replication slot is getting created while the last remaining
slot is getting dropped, looks like we rely on
LogicalDecodingControlLock to serialize the operations. Is that
correct? I don't see an injection point in this code path. So I am
imagine there's no direct test for this case.

+void
+DisableLogicalDecodingIfNecessary(bool complete_pending)
+{
+ bool need_wait = false;
+
+ /*
+ * Both complete_pending and proc_exit_inprogress must not be true at the
+ * same time.
+ */

That's evident from the assertion. I feel the comment should explain the reason.

+
+ /* With 'minimal' WAL level, logical decoding is always disabled */
+ if (wal_level == WAL_LEVEL_MINIMAL)
+ return;

Assume a server which has a standby running with wal_level =
'replica'. If we shut it down and restart it with wal_level =
'minimal', that won't be allowed. Hence we don't need to write a
XLOG_LOGICAL_DECODING_STATUS_CHANGE record during recovery in that
case. Do you think this should be documented somewhere? I did spend a
couple hours trying this out before I realized why it is safe to
return from here. May be a testcase for the same?

ReplicationSlotCleanup(bool synced_only)
{
LWLockRelease(ReplicationSlotControlLock);
+
... snip ...
+ if (dropped_logical && nlogicalslots == 0)
+ DisableLogicalDecodingIfNecessary(false);

Some magic going on here. If dropped_logical is true, at least one of
the loops dropped a logical slot. If nlogicalslots is zero, it means
that the last loop did not find any logical slots. When both the
conditions are true, it means that the function dropped the last
logical slot. Hence it calls DisableLogicalDecodingIfNecessary(). Is
that correct? I think we need some comments explaining this. Also
there's a tiny window between dropping the last logical slot and
calling DisableLogicalDecodingIfNecessary() when another logical slot
could be created. But DisableLogicalDecodingIfNecessary() has checks
to avoid disabling logical decoding in that case. I don't think all of
that is clear from the code changes. Maybe that's because of the way
this function is written in the first place. For example, why do we
need multiple loops here? Why not just one loop that drops all the
slots that need to be dropped? It's not as if the same backend is
going to create new temporary slots while it is cleaning up existing
ones.

Same goes for ReplicationSlotsDropDBSlots(). But there it is at least
possible that concurrent backends may have created a slot, so a loop
there is justified.
+ /* Initialize logical info WAL logging state */
+ InitializeProcessXLogLogicalInfo();
+
/*
* Initialize replication slots after pgstat. The exit hook might need to
* drop ephemeral slots, which in turn triggers stats reporting.

Do we need XlogLogicalInfo to be set before initializing replication
slots? Do we need to update the comment here?
--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-09-26 09:44:44 Re: Optimize LISTEN/NOTIFY
Previous Message Joel Jacobson 2025-09-26 09:32:29 Re: Optimize LISTEN/NOTIFY