From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-10-15 22:47:36 |
Message-ID: | CAD21AoAQJ0LuRYuj0g8-uB9Qtns88HK_TVdoa5jmX3ZPBK9gvw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 15, 2025 at 12:57 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawada-San,
>
> Here are some more review comments for v18.
>
> (WIP)
>
> ======
> src/backend/access/transam/xact.c
>
> MarkCurrentTransactionIdLoggedIfAny:
>
> 1.
> - * This returns true if wal_level >= logical and we are inside a valid
> - * subtransaction, for which the assignment was not yet written to any WAL
> - * record.
> + * This returns true if effective WAL level is logical and we are inside
> + * a valid subtransaction, for which the assignment was not yet written to
> + * any WAL record.
>
> Maybe it is better to always substitute 'wal_level' with
> 'effective_wal_level' for consistency?
>
> ~~~
>
> 2.
> - /* wal_level has to be logical */
> + /* effective WAL level has to be logical */
>
> ditto: maybe just say: "effective_wal_level has to be logical".
>
> ~~~
>
> 3.
> + * When effective WAL level is logical, guarantee that a subtransaction's
> + * xid can only be seen in the WAL stream if its toplevel xid has been
> + * logged before. If necessary we log an xact_assignment record with fewer
>
> ditto: maybe say "When effective_wal_level is logical..."
Fixed.
>
> ======
> src/backend/access/transam/xlog.c
>
> 4.
> +const char *
> +show_effective_wal_level(void)
> +{
> + char *str;
> +
> + if (wal_level == WAL_LEVEL_MINIMAL)
> + return "minimal";
> +
> + /*
> + * During the recovery, we need to check the shared status instead of the
> + * local XLogLogicalInfo cache as we don't synchronously update it during
> + * the recovery.
> + */
> + if (RecoveryInProgress())
> + return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
> +
> + if (wal_level == WAL_LEVEL_REPLICA)
> + {
> + /*
> + * With wal_level='replica', XLogLogicalInfo indicates the actual WAL
> + * level.
> + */
> + if (IsXLogLogicalInfoEnabled())
> + str = "logical";
> + else
> + str = "replica";
> + }
> + else
> + str = "logical";
> +
> + return str;
> +}
>
> It would be simpler and have more consistent ternary usage to have
> direct returns in all places instead of just half or them. Then remove
> the 'str' var:
>
> SUGGESTION
> ...
> if (wal_level == WAL_LEVEL_MINIMAL)
> return "minimal";
> ...
> if (RecoveryInProgress())
> return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
> ...
> if (wal_level == WAL_LEVEL_REPLICA)
> return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
> ...
> return "logical";
Thank you for the suggestion! I reconsidered and improved the code
flow of that function.
>
> ======
> src/backend/commands/publicationcmds.c
>
> 5.
> - if (wal_level != WAL_LEVEL_LOGICAL)
> + if (!IsLogicalDecodingEnabled())
> ereport(WARNING,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> - errmsg("\"wal_level\" is insufficient to publish logical changes"),
> - errhint("Set \"wal_level\" to \"logical\" before creating subscriptions.")));
> + errmsg("logical decoding should be allowed to publish logical changes"),
> + errhint("Before creating subscriptions, set \"wal_level\" >=
> \"logical\" or create a logical replication slot when \"wal_level\" =
> \"replica\".")));
>
> Why not just say errmsg "\"effective_wal_level\" is insufficient to
> publish logical changes"
I think that saying "logical decoding should be allowed" sounds more
understandable for users than saying "effective_wal_level is
insufficient".
>
> ======
> src/backend/commands/tablecmds.c
>
> 6.
> - /* should only get here if wal_level >= logical */
> + /* should only get here if effective WAL level is 'logical' */
> Assert(XLogLogicalInfoActive());
> ditto from earlier: say "should only get here if effective_wal_level is logical"
Fixed.
>
> ======
> src/backend/postmaster/checkpointer.c
>
> 7.
> + /*
> + * Disable logical decoding if someone requested to disable logical
> + * decoding. See comments atop logicalctl.c.
> + */
> + DisableLogicalDecodingIfNecessary();
> +
>
> The comment seems a bit repetitive.
>
> SUGGESTION
> Disable logical decoding if someone requested it. See comments atop
> logicalctl.c.
Fixed.
>
> ======
> src/backend/replication/logical/decode.c
>
> 8.
> - errmsg("logical decoding on standby requires \"wal_level\" >=
> \"logical\" on the primary")));
> + errmsg("logical decoding on standby requires \"effective_wal_level\"
> >= \"logical\" on the primary")));
>
> Is that >= really necessary?
>
> Why not say: "... requires \"effective_wal_level\" to be \"logical\" on the .."
>
> ======
> src/backend/replication/logical/logical.c
>
> 9.
> + errmsg("logical decoding on standby requires \"effective_wal_level\"
> >= \"logical\" on the primary"),
> + errhint("Set \"wal_level\" >= \"logical\" or create at least one
> logical slot when \"wal_level\" = \"replica\".")));
>
> ditto above. Those >= seem to make it more complex than necessary.
>
> errmsg can be: "...requires \"effective_wal_level\" to be \"logical\"
> on the ..."
> errhint can be: "Set \"wal_level\" = \"logical\" or..."
>
> ======
> src/backend/replication/logical/slotsync.c
>
> 10.
> ereport(elevel,
> errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> - errmsg("replication slot synchronization requires \"wal_level\" >=
> \"logical\""));
> + errmsg("replication slot synchronization requires
> \"effective_wal_level\" >= \"logical\" on the primary"),
> + errhint("To enable logical decoding on primary, set \"wal_level\" >=
> \"logical\" or create at least one logical slot when \"wal_level\" =
> \"replica\"."));
>
> ditto above. Those >= seem to make it more complex than necessary.
>
> errmsg can be: "...requires \"effective_wal_level\" to be \"logical\"
> on the ..."
> errhint can be: "...set \"wal_level\" = \"logical\" or..."
It seems we're already using >= in many places even before the patch:
src/backend/access/transam/xlogfuncs.c:
errmsg("pg_log_standby_snapshot() can only be used if \"wal_level\" >=
\"replica\"")));
src/backend/postmaster/postmaster.c:
(errmsg("replication slot synchronization (\"sync_replication_slots\"
= on) requires \"wal_level\" >= \"logical\"")));
src/backend/replication/logical/decode.c:
errmsg("logical decoding on standby requires
\"wal_level\" >= \"logical\" on the primary")));
src/backend/replication/logical/logical.c:
errmsg("logical decoding requires \"wal_level\" >= \"logical\"")));
src/backend/replication/logical/logical.c:
errmsg("logical decoding on standby requires \"wal_level\"
>= \"logical\" on the primary")));
src/backend/replication/logical/slotsync.c:
errmsg("replication slot synchronization requires \"wal_level\" >=
\"logical\""));
I agree that it makes the message more complex than necessary. I
imagine that we used that style since we can avoid editing these
messages when we introduce a higher level than 'logical', but I think
it's likely to adjust the messages in that case anyway. So how about
discussing this topic in another thread to simply improve the
user-faced error messages? If we can agree on such changes, this patch
can follow the new style.
>
> ======
> src/backend/replication/slot.c
>
> ReplicationSlotCleanup:
>
> 11.
> + if (SlotIsLogical(s))
> + nlogicalslots++;
> +
>
> How about assign SlotIsLogical(s) to a var like other places, instead
> of calling it 2x?
SlotIsLogical() is a macro so we don't necessarily need to avoid
calling multiple times in terms of performance. Even for readability,
the current style looks reasonable to me.
>
> ~~
>
> ReportSlotInvalidation:
>
> 12.
> case RS_INVAL_WAL_LEVEL:
> - appendStringInfoString(&err_detail, _("Logical decoding on standby
> requires \"wal_level\" >= \"logical\" on the primary server."));
> + appendStringInfoString(&err_detail, _("Logical decoding on standby
> requires the primary server to have either \"wal_level\" >=
> \"logical\" or at least one logical slot created."));
>
>
> 12a.
> ditto previous comments about the >=
Please refer to my comment to your similar comments above.
>
> ~
>
> 12b.
> This error message does not seem to be as good as previous ones. e.g.
> I think it should be mentioning about" replica", because creating
> slots with "minimal" is no good, right?
Fixed.
>
> ~~~
>
> InvalidateObsoleteReplicationSlots:
>
> 13.
> - * - RS_INVAL_WAL_LEVEL: is logical and wal_level is insufficient
> + * - RS_INVAL_WAL_LEVEL: is logical and logical decoding is disabled due to
> + * insufficient WAL level or no logical slot presence.
>
> Maybe it is just easier to say: "...is logical and effective_wal_level
> is not logical"
>
Agreed, fixed.
> ======
> src/backend/storage/ipc/standby.c
>
> 14.
> - if (wal_level < WAL_LEVEL_LOGICAL)
> + if (!IsLogicalDecodingEnabled())
> LWLockRelease(ProcArrayLock);
>
> recptr = LogCurrentRunningXacts(running);
>
> /* Release lock if we kept it longer ... */
> - if (wal_level >= WAL_LEVEL_LOGICAL)
> + if (IsLogicalDecodingEnabled())
> LWLockRelease(ProcArrayLock);
>
> ~
>
> Would it be better to assign IsLogicalDecodingEnabled() to a var here
> instead of 2x calls?
Yeah, I think it should do that since it seems possible that logical
decoding status can change between two calls, resulting in not
releasing the ProcArrayLock.
> ======
> src/backend/utils/cache/inval.c
>
> 15.
> + * When effective WAL level is 'logical', write invalidations into WAL at
> + * each command end to support the decoding of the in-progress transactions.
> + * See CommandEndInvalidationMessages.
>
> ditto some previous review comment: Say "when effective_wal_level is
> logical, write..."
Fixed.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-10-15 22:53:15 | Instability of phycodorus in pg_upgrade tests with JIT |
Previous Message | Masahiko Sawada | 2025-10-15 22:47:01 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |