From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-09-02 12:12:13 |
Message-ID: | CANhcyEVoMLojVk7pt6fwOepZzj5LJFV9VpnjhBwiOmcNcq+5oQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2 Sept 2025 at 17:05, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Aug 29, 2025 at 9:38 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've attached the updated patch.
> >
>
> Few comments:
> =============
> 1.
> + * When XLogLogicalInfoActive() is true, guarantee that a subtransaction's
> + * xid can only be seen in the WAL stream if its toplevel xid has been
> + * logged before. If necessary we log an xact_assignment record with fewer
> + * than PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if didLogXid isn't
> + * set for a transaction even though it appears in a WAL record, we just
> + * might superfluously log something. That can happen when an xid is
> + * included somewhere inside a wal record, but not in XLogRecord->xl_xid,
> + * like in xl_standby_locks.
> */
> if (isSubXact && XLogLogicalInfoActive() &&
> !TopTransactionStateData.didLogXid)
>
> Instead of writing XLogLogicalInfoActive() is true in comments, can we
> say When effective wal_level is logical and then also point to some
> place if required where the patch has explained about effective
> wal_level? Otherwise, it sounds like we are writing what is apparent
> from code and may not be very clear.
>
> 2.
> - /*
> - * Invalidate logical slots if we are in hot standby and the primary
> - * does not have a WAL level sufficient for logical decoding. No need
> - * to search for potentially conflicting logically slots if standby is
> - * running with wal_level lower than logical, because in that case, we
> - * would have either disallowed creation of logical slots or
> - * invalidated existing ones.
> - */
> - if (InRecovery && InHotStandby &&
> - xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> - wal_level >= WAL_LEVEL_LOGICAL)
> - InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
> - 0, InvalidOid,
> - InvalidTransactionId);
> -
> LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> ControlFile->MaxConnections = xlrec.MaxConnections;
> ControlFile->max_worker_processes = xlrec.max_worker_processes;
> @@ -8605,6 +8643,50 @@ xlog_redo(XLogReaderState *record)
> {
> /* nothing to do here, just for informational purposes */
> }
> + else if (info == XLOG_LOGICAL_DECODING_STATUS_CHANGE)
> + {
> + bool logical_decoding;
> +
> + /* Update the status on shared memory */
> + memcpy(&logical_decoding, XLogRecGetData(record), sizeof(bool));
> + UpdateLogicalDecodingStatus(logical_decoding, true);
> +
> + if (InRecovery && InHotStandby)
> + {
> + if (!logical_decoding)
>
> Like previously, shouldn't we have a check for standby's wal_level as
> well due to the reasons mentioned in the removed comments?
>
> 3.
> + errmsg("logical decoding needs to be enabled to publish logical changes"),
>
> This message doesn't sound intuitive. How about "logical decoding
> should be allowed to publish logical changes"?
>
> 4.
> + else if (info == XLOG_LOGICAL_DECODING_STATUS_CHANGE)
> ...
> + /*
> + * Request to launch or shutdown the slotsync worker depending on
> + * the new logical decoding status.
> + */
>
> If we see a similar part in existing code as a handling of
> XLOG_PARAMETER_CHANGE, we don't shutdown or restart slotsync worker,
> so why do it as part of this patch? This new behaviour may be better
> but shouldn't we try to handle it as a separate HEAD patch? Also, a
> few additional comments explaining the rationale behind this would be
> good.
>
I tested the behaviour with HEAD and with Patch. And I confirmed the
change in behaviour between HEAD and Patch
Suppose we have a primary and a standby with wal_level = logical and
guc parameters to enable slot sync worker are set accordingly. A slot
sync worker will be running.
Now we change the value of wal_level for primary to replica. And
restart the primary server
With HEAD, during restart the existing sync_slot_worker will exit with:
2025-09-02 11:49:08.846 IST [3877882] ERROR: synchronization worker
"" could not connect to the primary server: connection to server at
"localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
2025-09-02 11:49:11.380 IST [3877885] FATAL: streaming replication
receiver "walreceiver" could not connect to the primary server:
connection to server at "localhost" (127.0.0.1), port 5432 failed:
Connection refused
Is the server running on that host and accepting TCP/IP connections?
and after the restart of the primary server, slot sync worker will
restart and it is able to connect to the primary.
With Patch, during restart the existing sync_slot_worker will exit.
But after the restart of the primary server, slot sync worker cannot
start and we can see following log:
2025-09-02 12:44:51.497 IST [3947520] LOG: replication slot
synchronization worker is shutting down on receiving SIGINT
2025-09-02 12:44:51.498 IST [3943504] LOG: replication slot
synchronization requires logical decoding to be enabled
2025-09-02 12:44:51.498 IST [3943504] HINT: To enable logical
decoding on primary, set "wal_level" >= "logical" or create at least
one logical slot when "wal_level" = "replica".
2025-09-02 12:45:51.537 IST [3943504] LOG: replication slot
synchronization requires logical decoding to be enabled
2025-09-02 12:45:51.537 IST [3943504] HINT: To enable logical
decoding on primary, set "wal_level" >= "logical" or create at least
one logical slot when "wal_level" = "replica".
So, with HEAD, after we restart the primary server with 'wal_level =
replica', the slot sync worker can restart and connect to the primary
but with patch it cannot start after restart due to the check in
ValidateSlotSyncParams.
> 5.
> Assert(RecoveryInProgress());
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> - errmsg("logical decoding on standby requires \"wal_level\" >=
> \"logical\" on the primary")));
> + errmsg("logical decoding must be enabled on the primary")));
>
> Can't we keep the tone of the existing message as it is? How about
> "logical decoding on standby requires \"effective_wal_level\" >=
> \"logical\" on the primary"? Also, if we agree with this, we could
> have a similar change for other messages in the patch.
>
> 6. Can we write some comments as to why we didn't support wal_level to
> be changed from minimal to logical? It will be helpful for future
> readers/authors to understand what it would require to further extend
> this functionality.
>
Thanks,
Shlok Kyal
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Jones | 2025-09-02 12:16:08 | Re: [PATCH] Fix ALTER SYSTEM empty string bug for GUC_LIST_QUOTE parameters |
Previous Message | Shubham Khanna | 2025-09-02 11:42:04 | Re: Add support for specifying tables in pg_createsubscriber. |