Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-11-03 06:37:38
Message-ID: CAHut+PvkC7s4piL4TGzAy0VB9onk5w7utp7KLCFM8FL3+X7YcA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Sawada-San.

Here are my review comments for v23-0001. Most of them are cosmetic
comment suggestions.

======
src/backend/access/transam/xlog.c

show_effective_wal_level:

1.
+ /*
+ * During the recovery, instead of the local wal_level value, the
+ * primary's configuration is used for effective_wal_level. We need to
+ * check the shared status instead of local XLogLogicalInfo as we don't
+ * update it synchronously during the recovery.
+ */

SUGGESTION (reworded)
During recovery, effective_wal_level reflects the primary's
configuration rather than the local wal_level value. Check the shared
status instead of the local XLogLogicalInfo because XLogLogicalInfo is
not updated synchronously during recovery.

======
src/backend/replication/logical/logicalctl.c

UpdateLogicalDecodingStatus:

2.
+ /* It must be called before allowing the status change globally */
+ Assert(!LogicalDecodingCtl->status_change_allowed);

Reword the comment. Don't say "It".

~~~

start_logical_decoding_status_change:

3.
+/*
+ * This function does several kinds of preparation works required to start
+ * the process of logical decoding status change. If the status change is
+ * required, it sets LogicalDecodingCtl->status_change_inprogress, and
+ * returns true. Otherwise, if it's not required or not allowed (e.g., logical
+ * slots exist or during recovery) it returns false.
+ */

That first sentence "kinds of preparation works" sounds strange.

SUGGESTION:
Performs preparation work required before changing the logical decoding status.

~~~
RequestDisableLogicalDecoding:

4.
+/*
+ * Initiate a request for disabling logical decoding.
+ *
+ * This function expects to be called after dropping or invalidating the
+ * possibly-last logical replication slot as it doesn't check the
logical slot presence.
+ */
+void
+RequestDisableLogicalDecoding(void)

SUGGESTION (rearranged sentence)
This function does not verify whether logical slots exist. It should
be called after dropping or invalidating what might be the last
logical replication slot.

======
src/backend/replication/slot.c

ReplicationSlotRelease:

5.
+ /*
+ * Request to disable logical decoding even though this slot might
+ * have not been the last logical slot to drop. The checkpointer will
+ * check if logical decoding should be disabled afterward anyway.
+ */
+ if (is_logical)
+ RequestDisableLogicalDecoding();

There are some typos "to drop" and "afterward", but maybe it's better
just reword like below:

SUGGESTION:
Request logical decoding to be disabled, even though this slot may not
have been the last logical slot. The checkpointer will verify whether
logical decoding should actually be disabled.

~~~

CheckLogicalSlotExists:

6.
+bool
+CheckLogicalSlotExists(void)
+{
+ bool found = false;
+
+ if (max_replication_slots <= 0)
+ return false;
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (int i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ bool invalidated;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ if (SlotIsPhysical(s))
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ invalidated = s->data.invalidated != RS_INVAL_NONE;
+ SpinLockRelease(&s->mutex);
+
+ if (invalidated)
+ continue;
+
+ found = true;
+ break;
+ }
+ LWLockRelease(ReplicationSlotControlLock);
+
+ return found;
+}

Some inconsistent var usage. e.g. These functions have lots of
similarities but some say 'invalidated' and some say 'is_valid'.

~~~

InvalidatePossiblyObsoleteSlot

7.
Oid dboid, TransactionId snapshotConflictHorizon,
- bool *invalidated)
+ bool *released_lock_out)
...
- return released_lock;
+ *released_lock_out = released_lock;
+ return invalidated;

Is that output parameter 'released_lock_out' really needed? IIUC you
could do away with this parameter entirely, and just check
LWLockHeldByMe(ReplicationSlotControlLock) back where this function
(InvalidatePossiblyObsoleteSlot) was called?

~~~

8.
+ /*
+ * Request the checkpointer process to disable logical decoding if no
+ * valid logical slots exist. While the checkpointer can call this
+ * function during a checkpoint, it initiates the request but not do the
+ * actual deactivation so that it can complete the checkpointing first.
+ */

typo: "but not do the"

SUGGESTED (reworded)
Request the checkpointer to disable logical decoding if no valid
logical slots remain. If called by the checkpointer during a
checkpoint, only the request is initiated; actual deactivation is
deferred until after the checkpoint completes.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-11-03 06:53:14 Re: Report bytes and transactions actually sent downtream
Previous Message Vaibhav Dalvi 2025-11-03 06:35:46 Re: Non-text mode for pg_dumpall