Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-10-22 21:54:18
Message-ID: CAD21AoBb-HPSnAdyr2w1vr9hHX+jckMxSYB2JV6As=10ysingg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 20, 2025 at 10:28 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawada-San.
>
> Here are some mostly cosmetic review comments for patch v19 (excluding
> test code).

Thank you for reviewing the patch!

>
> ======
> doc/src/sgml/logicaldecoding.sgml
>
> 1.
> + <para>
> + If either condition is met, the operational WAL level becomes equivalent
> + to <literal>logical</literal>, which can be monitored through
> + <xref linkend="guc-effective-wal-level"/> parametr.
> + </para>
>
> Typo? /through/through the/
>
> Typo: /parametr/parameter/
>
> ~~~
>
> 2.
> + <para>
> + When <varname>wal_level</varname> is set to <literal>replica</literal>,
> + logical decoding is automatically activated upon creation of the first
> + logical replication slot. This activation process involves several steps
> + and requires synchronization among processes, ensuring system-wide
> + onsistency. Conversely, when the last logical replication slot is dropped
>
> Typo: /onsistency/consistency/
>
> ~~~
>
> 3.
> + <caution>
> + <para>
> + When <varname>wal_level</varname> is set to <literal>replica</literal>,
> + dropping the last logical slot may disable logical decoding on primary,
> + resulting in slots on standbys being invalidated.
> + </para>
> + </caution>
>
> Typo? /on primary/on the primary/
>
> ======
> doc/src/sgml/system-views.sgml
>
> 4.
> <para>
> <literal>wal_level_insufficient</literal> means that the
> - primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to
> - perform logical decoding. It is set only for logical slots.
> + logical decoding is disabled on primary (See
> + <xref linkend="logicaldecoding-explanation-log-dec"/>). It is set
> + only for logical slots.
> </para>
>
> Typo? /on primary/on the primary/
>
> ======
> src/backend/access/transam/xlog.c
>
> 5.
> + * XXX: We keep the slotsync worker running even after logical
> + * decoding is disabled for simplicity. A future improvement
> + * can consider starting and stopping the worker based on
> + * logical decoding status change.
>
> First sentence might be better written as:
> For simplicity, we keep the slotsync worker running even after logical
> decoding is disabled.
>
> ======
> src/backend/commands/publicationcmds.c
>
> 6.
> + errmsg("logical decoding should be allowed to publish logical changes"),
> + errhint("Before creating subscriptions, set \"wal_level\" >=
> \"logical\" or create a logical replication slot when \"wal_level\" =
> \"replica\".")));
>
> 6a.
> I still felt this this errmsg wording is inconsistent -- e.g. You have
> not used the term "allowed" anywhere else in the patch; mostly logical
> decoding is said to be enabled/disabled.
>
> Maybe:
> "to publish logical changes the logical decoding must be enabled"
>
> Or simply:
> "local decoding is not enabled"
>
> ~~
>
> 6b.
> I'm not sure that you really needed to say "or create a logical
> replication slot when wal_level = replica". Because, the CREATE
> SUBSCRIPTION it refers to would be creating that logical slot anyway,
> won't it?
>
> So maybe the errhint only need to say:
> Set \"wal_level\" >= \"replica\".
>
> ======
> src/backend/replication/logical/logicalctl.c
>
> 7.
> + * This module enables dynamic control of logical decoding availability.
> + * Logical decoding becomes active under two conditions: when the wal_level
> + * parameter is set to 'logical', or when at least one logical replication
> + * slot exists with wal_level set to 'replica'. The system disables logical
> + * decoding when neither condition is met. Therefore, the dynamic control
> + * of logical decoding availability is required only when wal_level is set
> + * to 'replica'. With 'logical' WAL level, logical decoding is always enabled
> + * whereas with 'minimal' WAL level, it's always disabled.
>
> I thought it is more easy/consistent to refer always wal_level instead
> of sometimes referring to WAL level.
>
> SUGGESTION (for last sentence)
> Logical decoding is always enabled when wal_level='logical' and always
> disabled when wal_level='minimal'.
>
> ~~~
>
> 8.
> + *-------------------------------------------------------------------------
> + */
> +#include "postgres.h"
>
> Missing blank line before the #includes.
>
> ~~~
>
> StartupLogicalDecodingStatus:
>
> 9.
> + if (last_status)
> + {
> + LogicalDecodingCtl->xlog_logical_info = true;
> + LogicalDecodingCtl->logical_decoding_enabled = true;
> + }
>
> Is the 'if' needed? It could have just said:
> LogicalDecodingCtl->xlog_logical_info = last_status;
> LogicalDecodingCtl->logical_decoding_enabled = last_status;
>
> Or, why not just delegate to the other function:
> UpdateLogicalDecodingStatus(last_status, false);
>
> ~~~
>
> start_logical_decoding_status_change:
>
> 10.
> + /*
> + * Wait for the recovery to complete. Note that even the checkpointer
> + * can wait for the recovery to complete here without concerning
> + * deadlocks unless the startup process doesn't perform any action
> + * that waits for it after calling
> + * UpdateLogicalDecodingStatusEndOfRecovery().
> + */
>
> Is the wording correct? The "unless ... doesn't" seems backwards.
>
> ~~~
>
> LogicalDecodingStatusChangeAllowed:
>
> 11.
> + /*
> + * We check the current status to see if we need to change it. If a status
> + * change is in-progress, we need to wait for completion.
> + */
> + if (LogicalDecodingCtl->status_change_inprogress)
>
> Only the 2nd sentence of the comment seems relevant to this code.
> AFAICT the 1st sentence belongs to logic that comes later in this
> function.
>
> ~~~
>
> RequestDisableLogicalDecoding:
>
> 12.
> +/*
> + * Initiate a request for disabling logical decoding.
> + *
> + * This function expects to be after dropping the possibly-last logical
> + * replication slot as it doesn't check the logical slot presence.
> + */
>
> Typo? /expects to be after/expects to be called after/
>
> ~~~
>
> 13.
> + /*
> + * Check if the status change is allowed before initiating a deactivation
> + * request, to avoid unnecessary work.
> + */
> + if (!LogicalDecodingStatusChangeAllowed())
> + return;
>
> IMO call this the "disable request" (same as the function comment;
> same as the function name) instead of introducing new terminolgy like
> "deactivation request".
>
> ~~~
>
> DisableLogicalDecodingIfNecessary:
>
> 14.
> + /*
> + * When disabling logical decoding, we need to disable logical decoding
> + * first and disable logical information WAL logging in order to ensure
> + * that no logical decoding processes WAL records with insufficient
> + * information.
> + */
>
> wording?
>
> ~~~
>
> 15.
> + /*
> + * Order all running processes to reflect the xlog_logical_info update.
> + * Unlike when enabling logical decoding, we don't need to wait for all
> + * processes to complete it in this case. We already disabled logical
> + * decoding and it's always safe to write logical information to WAL
> + * records, even when not strictly required. Therefore, we don't need to
> + * wait for all running transactions to finish either.
> + */
>
> Order makes me think about sorting. Maybe "Tell" instead of "Order" is clearer.
>
> ~~~
>
> UpdateLogicalDecodingStatusEndOfRecovery:
>
> 16.
> + /*
> + * We can use logical decoding if we're using 'logical' WAL level or there
> + * is at least one logical replication slot.
> + */
> + if (wal_level == WAL_LEVEL_LOGICAL || CheckLogicalSlotExists())
> + new_status = true;
> +
> + if (LogicalDecodingCtl->logical_decoding_enabled != new_status)
> + need_wal = true;
>
> 16a.
> The comment would be clearer to say similar to lots of other places:
>
> SUGGESTION
> We can use logical decoding if wal_level=logical, or wal_level=replica
> and there is at least one logical replication slot.
>
> ~
>
> 16b.
> The conditions are not really needed. Maybe it is easier to write as below:
>
> new_status = (wal_level == WAL_LEVEL_LOGICAL || CheckLogicalSlotExists());
> need_wal = (LogicalDecodingCtl->logical_decoding_enabled != new_status);

All comments make sense to me. I intentionally left some 'if'
statement for readability (suggested in 16b) but addressed other
comments in the next patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-10-22 21:55:13 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Thomas Munro 2025-10-22 21:51:26 Re: IO in wrong state on riscv64