From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-10-21 05:27:41 |
Message-ID: | CAHut+PtB4zZrMKm_vUd-JO29Uq2479unx1KR=hGu=sQxfSVXJg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Sawada-San.
Here are some mostly cosmetic review comments for patch v19 (excluding
test code).
======
doc/src/sgml/logicaldecoding.sgml
1.
+ <para>
+ If either condition is met, the operational WAL level becomes equivalent
+ to <literal>logical</literal>, which can be monitored through
+ <xref linkend="guc-effective-wal-level"/> parametr.
+ </para>
Typo? /through/through the/
Typo: /parametr/parameter/
~~~
2.
+ <para>
+ When <varname>wal_level</varname> is set to <literal>replica</literal>,
+ logical decoding is automatically activated upon creation of the first
+ logical replication slot. This activation process involves several steps
+ and requires synchronization among processes, ensuring system-wide
+ onsistency. Conversely, when the last logical replication slot is dropped
Typo: /onsistency/consistency/
~~~
3.
+ <caution>
+ <para>
+ When <varname>wal_level</varname> is set to <literal>replica</literal>,
+ dropping the last logical slot may disable logical decoding on primary,
+ resulting in slots on standbys being invalidated.
+ </para>
+ </caution>
Typo? /on primary/on the primary/
======
doc/src/sgml/system-views.sgml
4.
<para>
<literal>wal_level_insufficient</literal> means that the
- primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to
- perform logical decoding. It is set only for logical slots.
+ logical decoding is disabled on primary (See
+ <xref linkend="logicaldecoding-explanation-log-dec"/>). It is set
+ only for logical slots.
</para>
Typo? /on primary/on the primary/
======
src/backend/access/transam/xlog.c
5.
+ * XXX: We keep the slotsync worker running even after logical
+ * decoding is disabled for simplicity. A future improvement
+ * can consider starting and stopping the worker based on
+ * logical decoding status change.
First sentence might be better written as:
For simplicity, we keep the slotsync worker running even after logical
decoding is disabled.
======
src/backend/commands/publicationcmds.c
6.
+ errmsg("logical decoding should be allowed to publish logical changes"),
+ errhint("Before creating subscriptions, set \"wal_level\" >=
\"logical\" or create a logical replication slot when \"wal_level\" =
\"replica\".")));
6a.
I still felt this this errmsg wording is inconsistent -- e.g. You have
not used the term "allowed" anywhere else in the patch; mostly logical
decoding is said to be enabled/disabled.
Maybe:
"to publish logical changes the logical decoding must be enabled"
Or simply:
"local decoding is not enabled"
~~
6b.
I'm not sure that you really needed to say "or create a logical
replication slot when wal_level = replica". Because, the CREATE
SUBSCRIPTION it refers to would be creating that logical slot anyway,
won't it?
So maybe the errhint only need to say:
Set \"wal_level\" >= \"replica\".
======
src/backend/replication/logical/logicalctl.c
7.
+ * This module enables dynamic control of logical decoding availability.
+ * Logical decoding becomes active under two conditions: when the wal_level
+ * parameter is set to 'logical', or when at least one logical replication
+ * slot exists with wal_level set to 'replica'. The system disables logical
+ * decoding when neither condition is met. Therefore, the dynamic control
+ * of logical decoding availability is required only when wal_level is set
+ * to 'replica'. With 'logical' WAL level, logical decoding is always enabled
+ * whereas with 'minimal' WAL level, it's always disabled.
I thought it is more easy/consistent to refer always wal_level instead
of sometimes referring to WAL level.
SUGGESTION (for last sentence)
Logical decoding is always enabled when wal_level='logical' and always
disabled when wal_level='minimal'.
~~~
8.
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
Missing blank line before the #includes.
~~~
StartupLogicalDecodingStatus:
9.
+ if (last_status)
+ {
+ LogicalDecodingCtl->xlog_logical_info = true;
+ LogicalDecodingCtl->logical_decoding_enabled = true;
+ }
Is the 'if' needed? It could have just said:
LogicalDecodingCtl->xlog_logical_info = last_status;
LogicalDecodingCtl->logical_decoding_enabled = last_status;
Or, why not just delegate to the other function:
UpdateLogicalDecodingStatus(last_status, false);
~~~
start_logical_decoding_status_change:
10.
+ /*
+ * Wait for the recovery to complete. Note that even the checkpointer
+ * can wait for the recovery to complete here without concerning
+ * deadlocks unless the startup process doesn't perform any action
+ * that waits for it after calling
+ * UpdateLogicalDecodingStatusEndOfRecovery().
+ */
Is the wording correct? The "unless ... doesn't" seems backwards.
~~~
LogicalDecodingStatusChangeAllowed:
11.
+ /*
+ * We check the current status to see if we need to change it. If a status
+ * change is in-progress, we need to wait for completion.
+ */
+ if (LogicalDecodingCtl->status_change_inprogress)
Only the 2nd sentence of the comment seems relevant to this code.
AFAICT the 1st sentence belongs to logic that comes later in this
function.
~~~
RequestDisableLogicalDecoding:
12.
+/*
+ * Initiate a request for disabling logical decoding.
+ *
+ * This function expects to be after dropping the possibly-last logical
+ * replication slot as it doesn't check the logical slot presence.
+ */
Typo? /expects to be after/expects to be called after/
~~~
13.
+ /*
+ * Check if the status change is allowed before initiating a deactivation
+ * request, to avoid unnecessary work.
+ */
+ if (!LogicalDecodingStatusChangeAllowed())
+ return;
IMO call this the "disable request" (same as the function comment;
same as the function name) instead of introducing new terminolgy like
"deactivation request".
~~~
DisableLogicalDecodingIfNecessary:
14.
+ /*
+ * When disabling logical decoding, we need to disable logical decoding
+ * first and disable logical information WAL logging in order to ensure
+ * that no logical decoding processes WAL records with insufficient
+ * information.
+ */
wording?
~~~
15.
+ /*
+ * Order all running processes to reflect the xlog_logical_info update.
+ * Unlike when enabling logical decoding, we don't need to wait for all
+ * processes to complete it in this case. We already disabled logical
+ * decoding and it's always safe to write logical information to WAL
+ * records, even when not strictly required. Therefore, we don't need to
+ * wait for all running transactions to finish either.
+ */
Order makes me think about sorting. Maybe "Tell" instead of "Order" is clearer.
~~~
UpdateLogicalDecodingStatusEndOfRecovery:
16.
+ /*
+ * We can use logical decoding if we're using 'logical' WAL level or there
+ * is at least one logical replication slot.
+ */
+ if (wal_level == WAL_LEVEL_LOGICAL || CheckLogicalSlotExists())
+ new_status = true;
+
+ if (LogicalDecodingCtl->logical_decoding_enabled != new_status)
+ need_wal = true;
16a.
The comment would be clearer to say similar to lots of other places:
SUGGESTION
We can use logical decoding if wal_level=logical, or wal_level=replica
and there is at least one logical replication slot.
~
16b.
The conditions are not really needed. Maybe it is easier to write as below:
new_status = (wal_level == WAL_LEVEL_LOGICAL || CheckLogicalSlotExists());
need_wal = (LogicalDecodingCtl->logical_decoding_enabled != new_status);
======
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2025-10-21 05:32:01 | Re: BRIN: Prevent the heapblk overflow during index summarization on very large tables resulting in an infinite loop |
Previous Message | Richard Guo | 2025-10-21 05:26:19 | Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master |