From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-09-12 08:56:42 |
Message-ID: | CAExHW5s6J-=pxUesRHOrFj6sN-1TjZ1uXMHx7+c2T-d-wpN4gQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Sep 12, 2025 at 9:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
>
> One thing related to this which needs a discussion is after this
> change, it is possible that part of the transaction contains
> additional logical_wal_info. I couldn't think of a problem due to this
> but users using pg_waldump or other WAL reading utilities could
> question this. One possibility is that we always start including
> logical_wal_info for the next new transaction but not sure if that is
> required. It would be good if other people involved in the discussion
> or otherwise could share their opinion on this point.
>
AFAIR, logical info is a separate section in a WAL record, and there
is not marker which says "WAL will contain logical info henceforth".
So the utilities should be checking for the existence of such info
before reading it. So I guess it should be ok. Some extra sensitive
utilities may expect that once a WAL record has logical info, all the
succeeding WAL records will have it. They may find it troublesome that
WAL records with and without logical info are interleaved. Generally,
I would prefer that presence/absence of logical info changes at
transaction boundaries, but we will still have interleaving WAL
records. So I doubt how much that matters.
Sorry for jumping late in the discussion. I have a few comments,
mostly superficial ones. I am yet to take a deeper look at the
synchronization logic.
<sect2 id="logicaldecoding-replication-slots">
@@ -328,8 +362,7 @@ postgres=# select * from
pg_logical_slot_get_changes('regression_slot', NULL, NU
that could be needed by the logical decoding on the standby (as it does
not know about the <literal>catalog_xmin</literal> on the standby).
Existing logical slots on standby also get invalidated if
- <varname>wal_level</varname> on the primary is reduced to less than
- <literal>logical</literal>.
+ logical decoding becomes disabled on the primary.
s/becomes disabled/is disabled/ or /gets disabled/. Given that logical
decoding can be disabled in multiple ways, it's better to add a
reference here to a section which explains what disabling logical
decoding means.
<listitem>
<para>
<literal>wal_level_insufficient</literal> means that the
- primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to
- perform logical decoding. It is set only for logical slots.
+ logical decoding is disabled on primary due to insufficient
+ <xref linkend="guc-wal-level"/> or no logical slots. It is set only
+ for logical slots.
It may not be apparent to the users that insufficient wal_level means
'minimal' here. It will be better if we just mention logical decoding
is disabled on primary and refer to a section which explains what
disabling logical decoding means.
*
* Skip this if we're taking a full-page image of the new page, as we
* don't include the new tuple in the WAL record in that case. Also
- * disable if wal_level='logical', as logical decoding needs to be able to
- * read the new tuple in whole from the WAL record alone.
+ * disable if logical decoding is enabled, as logical decoding needs to be
+ * able to read the new tuple in whole from the WAL record alone.
*/
Not fault of this patch, but I find the comment to be slighly out of
sync with the code. The code actually checks whether the logical
decoding is enabled and whether the relation requires WAL to be logged
for logical decoding. The difference is subtle, but it did cause me a
bit of confusion when I read the code. Please consider rephrasing the
comment while you are modifying it.
if (oldbuf == newbuf && !need_tuple_data &&
!XLogCheckBufferNeedsBackup(newbuf))
@@ -9057,8 +9057,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
/*
* Perform XLogInsert of an XLOG_HEAP2_NEW_CID record
*
- * This is only used in wal_level >= WAL_LEVEL_LOGICAL, and only for catalog
- * tuples.
+ * This is only used when effective WAL level is logical, and only for
Given that the earlier comment used GUC name, it's better to use the
GUC name "effective_wal_level" here.
case XLOG_PARAMETER_CHANGE:
+
+ /*
+ * Even if wal_level on the primary got decreased to 'replica' it
+ * doesn't necessarily mean to disable the logical decoding as
+ * long as we have at least one logical slot. So we don't check
+ * the logical decoding availability here but do in
+ * XLOG_LOGICAL_DECODING_STATUS_CHANGE case.
+ */
The earlier code checked for wal_level < WAL_LEVEL_LOGICAL, which
includes the case when wal_level is 'minimal'. I don't see the new
code handling 'minimal' case here. Am I missing something? Do we need
a comment here which specifically mentions "minimal" case
grammar "Even if wal_level on the primary was lowered to 'replica', as
long as there is at least one logical slot, the logical decoding
remains enabled. ... " also ... do so in ... .
+ * The module maintains separate controls of two aspects: writing information
+ * required by logical decoding to WAL records and utilizing logical decoding
+ * itself, controlled by LogicalDecodingCtl->xlog_logical_info and
+ * ->logical_decoding_enabled fields respectively. The activation process of
+ * logical decoding involves several steps, beginning with maintaining logical
+ * decoding in a disabled state while incrementing the effective WAL level to
+ * its 'logical' equivalent. This change is reflected in the read-only
+ * effective_wal_level GUC parameter. The process includes necessary
+ * synchronization to ensure all processes adapt to the new effective WAL
+ * level before logical decoding is fully enabled. Deactivation follows a
+ * similarly careful, multi-step process in the reverse order.
+ *
I was expecting that the step-by-step process would be described in a
README. But given that we have this detailed comment here, it may be
good to have the step-by-step process described here itself.
I see some comments use "effective_wal_level is logical" and some
mention "logical decoding is enabled" depending upon the context. I
think, it's important to differentiate between these two given that
the it's possible to find the system in a state where
effective_wal_level is 'logical' but logical decoding is disabled. But
as a first reader, this did cause me some confusion. Above paragraph
is a good place to make that distinction clear. The description of
step-by-step process will make things clearer.
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* NB: intentionally counting invalidated slots */
Explain the reason in the comment.
+
+# Cleanup all existing slots and start the concurrency test.
+$primary->safe_psql('postgres',
+ qq[select pg_drop_replication_slot('test_slot')]);
We should add effective wal level test. At a later point we have a
test for this but it would be good to make sure that the effective wal
level is replica at this stage in the test as well.
test_wal_level($primary, "replica|replica", "effective_wal_level reset
to 'replica' after dropping all logical slots");
+
+# Wait for the logical slot 'test_slot' has been created.
Should be "Wait for the logical slot 'test_slot' to be created" or
"Check that the logical slot 'test_slot' has been created".
+# Check if the standby's effective_wal_level should be 'logical' in spite
+# of wal_level being 'replica'.
+test_wal_level($standby1, "replica|logical",
+ "effective_wal_level='logical' on standby");
Do we have a test to verify that a logical replication slot can not be
created on a standby whose primary does *not* have effective_wal_level
'logical'?
+
+# Create a logical slot on the standby, which should be succeeded
grammar: ..., which should succeed OR better "Creating a logical slot
on standby should succeed".
+# as the primary enables it.
as the primary has logical decoding enabled.
+# Check if the logical decoding is not enabled on the standby4.
+test_wal_level($standby4, "logical|replica",
+ "standby's effective_wal_level got decreased to 'replica'");
+$standby4->safe_psql('postgres',
+ qq[select pg_drop_replication_slot('standby4_slot')]);
+
Instead of dropping the slot, if we create another slot on the
primary, what happens to the invalidated replication slot? Does it
remain invalidated? Have we covered this scenario in tests?
+
+done_testing();
What happens if we create a logical slot when wal_level is 'minimal'?
Do we have a test for that?
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Shlok Kyal | 2025-09-12 09:04:26 | Re: issue with synchronized_standby_slots |
Previous Message | Chao Li | 2025-09-12 08:51:14 | Re: Add support for entry counting in pgstats |