Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-10-29 12:10:01
Message-ID: CAA4eK1LUmrNWa=JGdedzKD=gM3wq0ZsfKREQu1G8NqqWCpZPgA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 28, 2025 at 5:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> I have noticed a few minor points:
>

Some more comments:
1.
-  if (RecoveryInProgress())
-  {
-    /*
-     * This check may have race conditions, but whenever
-     * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
-     * verify that there are no existing logical replication slots. And to
-     * avoid races around creating a new slot,
-     * CheckLogicalDecodingRequirements() is called once before creating
-     * the slot, and once when logical decoding is initially starting up.
-     */
-    if (GetActiveWalLevelOnStandby() < WAL_LEVEL_LOGICAL)
-      ereport(ERROR,
-          (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-           errmsg("logical decoding on standby requires \"wal_level\"
>= \"logical\" on the primary")));
-  }
+  /* CheckSlotRequirements() has already checked if wal_level >= 'replica' */
+
+  /*
+   * Check if logical decoding is available on standbys. Note that
+   * LogicalDecodingStatusChangeAllowed() returning false implies the
+   * recovery is in-progress. See the comments atop logicalctl.c for
+   * details.
+   */
+  if (!IsLogicalDecodingEnabled() && !LogicalDecodingStatusChangeAllowed())

In this change, code and comments without the patch are easy to
follow. IIUC, this is primarily to allow slot creation during the
promotion. How about something like:
"Check if logical decoding is available on standbys. Typically, when
running a standby, RecoveryInProgress() returning true implies that
LogicalDecodingStatusChangeAllowed() is false. However, during
promotion, there's a brief transitional phase where
RecoveryInProgress() remains true even though
LogicalDecodingStatusChangeAllowed() has already turned true.

In this window, logical decoding enable/disable operations are
permitted on standby, anticipating its transition to primary. The
actual wait for recovery completion is handled within
start_logical_decoding_status_change()."

2
+ * Standby servers inherit the logical decoding and logical WAL writing status
+ * from the primary server.

It sounds a bit odd to say that standby inherits logical WAL writing
status from primary, as there is no WAL writing happening on standby.

3.
other processes might attempt to
+ * enable or disable logical decoding before recovery fully completes

Instead of enable or disable, it would be better to say toggle logical …

4.
+/*
+ * Check the shared memory state and return true if logical information WAL
+ * logging is enabled.
+ */

In this "...logical information WAL logging is enabled." sounds a bit
odd. How about something like: "Returns true if logical WAL logging is
enabled based on the shared memory state."?

5.
EnsureLogicalDecodingEnabled() and DisableLogicalDecodingIfNecessary()
+ * should be used instead if there could be concurrent processes doing writes
+ * or logical decoding.
+ */
+void
+UpdateLogicalDecodingStatus(bool new_status, bool need_lock)

Can we ensure what is written in the above comment by an Assert?

6.
+DisableLogicalDecodingIfNecessary(void)
+{
+ bool pending_disable;
+
+ if (wal_level != WAL_LEVEL_REPLICA)
+ return;
+
+ /*
+ * Sanity check as we cannot disable logical decoding while holding a
+ * logical slot.
+ */
+ Assert(!MyReplicationSlot);

Is this assertion relevant now, considering we disable
logical_decoding only via checkpointer?

7.
- if (SlotIsLogical(s))
- nlogicalslots++;
+ if (SlotIsLogical(s) && s->data.invalidated == RS_INVAL_NONE)
+ n_valid_logicalslots++;

Is it okay to check the invalidated flag without the spinlock? If so,
it is better to write a comment as to why it is safe.

8.
@@ -136,6 +137,12 @@ create_logical_replication_slot(char *name, char *plugin,
temporary ? RS_TEMPORARY : RS_EPHEMERAL, two_phase,
failover, false);

+ /*
+ * Ensure the logical decoding is enabled before initializing the logical
+ * decoding context.
+ */
+ EnsureLogicalDecodingEnabled();

EnsureLogicalDecodingEnabled(void)
{
Assert(MyReplicationSlot);

if (wal_level != WAL_LEVEL_REPLICA)
return;

/* Prepare and start the activation process if it's disabled */
if (!start_logical_decoding_status_change(true))
return;

I see that EnsureLogicalDecodingEnabled() sometimes returns without
enabling decoding (like when wal_level is not a replica). It is
possible that the same is not possible in this code flow, but won't it
be better to get the return value and assert if this function returns
false?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2025-10-29 12:17:47 Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Previous Message Dagfinn Ilmari Mannsåker 2025-10-29 12:04:24 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions