From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-08-01 23:24:51 |
Message-ID: | CAD21AoBObzgdOgZOXPb0zGWcNXdkXABj_1u+9TFZyQBt4aY=pw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 28, 2025 at 9:44 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 25 Jul 2025 at 11:45, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 22, 2025 at 11:44 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jul 22, 2025 at 5:03 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > Yes, I agree. The main patch focuses on the part where we
> > > > automatically change the effective WAL level upon the logical slot
> > > > creation and deletion (and potentially remove 'logical' from
> > > > wal_level), and other things are implemented as additional features in
> > > > a separate patch.
> > >
> > > I am keeping my focus on patch001 until we decide further on how to
> > > protect the slot.
> >
> > Yeah, I also dropped the additional feature patch from the patch set for now.
> >
> > > Apart from few comments in [1], please find one
> > > more concern:
> > >
> > > There is a race condition between creating and dropping a replication
> > > slot when enabling or disabling logical decoding. We might end up with
> > > logical decoding disabled even when a logical slot is present.
> > >
> > > Steps:
> > > 1) Set wal_level=replica on primary.
> > > 2) Create logical_slot1 which will enable logical decoding, causing
> > > effective_wal_level to become logical.
> > > 3) Drop logical_slot1 and pause execution inside
> > > DisableLogicalDecodingIfNecessary() right after the
> > > 'n_inuse_logical_slots' check using a debugger.
> > > 4) In another session, create logical_slot2. It will attempt to enable
> > > logical-decoding but since it is already enabled,
> > > EnsureLogicalDecodingEnabled() will be a no-op.
> > > 5) Release debugger of drop-slot, it will disable logical decoding.
> > >
> > > Ultimately, logical_slot2is present while logical decoding is disabled
> > > and thus we see this:
> > >
> > > postgres=# select slot_name from pg_replication_slots;
> > > slot_name
> > > ---------------
> > > logical_slot2
> > >
> > > postgres=# show effective_wal_level;
> > > effective_wal_level
> > > ---------------------
> > > replica
> > > (1 row)
> > >
> > > postgres=# select pg_logical_slot_get_changes('logical_slot2', NULL,
> > > NULL, 'proto_version', '4', 'publication_names', 'pub');
> > > ERROR: logical decoding is not enabled
> > > HINT: Set "wal_level" >= "logical" or create at least one logical slot.
> > >
> > > Shall we acquire LogicalDecodingControlLock in exclusive mode at a
> > > little earlier stage? Currently we acquire it after
> > > IsLogicalDecodingEnabled() check. I think we shall acquire it before
> > > this check in in both enable and disable flow?
> >
> > Thank you for testing the patch!
> >
> > I've reworked the locking part in the patch. The attached v4 patch
> > should address all review comments including your previous
> > comments[1].
>
> Few comments:
> 1) pg_waldump not handled for the new WAL record added
> XLOG_LOGICAL_DECODING_STATUS_CHANGE:
> + XLogRegisterData(&logical_decoding, sizeof(bool));
> + recptr = XLogInsert(RM_XLOG_ID,
> XLOG_LOGICAL_DECODING_STATUS_CHANGE);
> + XLogFlush(recptr);
>
> rmgr: XLOG len (rec/tot): 54/ 54, tx: 0, lsn:
> 0/017633D8, prev 0/01763360, desc: PARAMETER_CHANGE
> max_connections=100 max_worker_processes=8 max_wal_senders=10
> max_prepared_xacts=10 max_locks_per_xact=64 wal_level=replica
> wal_log_hints=off track_commit_timestamp=off
> rmgr: XLOG len (rec/tot): 27/ 27, tx: 0, lsn:
> 0/01763410, prev 0/017633D8, desc: UNKNOWN (f0)
> rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn:
> 0/01763430, prev 0/01763410, desc: RUNNING_XACTS nextXid 754
> latestCompletedXid 753 oldestRunningXid 754
>
> 2) Similarly pg_walinspect also should be handled for the new WAL record added:
> postgres=# SELECT * FROM pg_get_wal_records_info('0/017633D8', '0/01763468');
> start_lsn | end_lsn | prev_lsn | xid | resource_manager |
> record_type | record_length | main_data_length | fpi_length |
> description
> | block_ref
> ------------+------------+------------+-----+------------------+------------------+---------------+------------------+------------+-----------------------------------------------------------
> ---------------------------------------------------------------------------------------------------------------+-----------
> 0/017633D8 | 0/01763410 | 0/01763360 | 0 | XLOG |
> PARAMETER_CHANGE | 54 | 28 | 0 |
> max_connections=100 max_worker_processes=8 max_wal_senders
> =10 max_prepared_xacts=10 max_locks_per_xact=64 wal_level=replica
> wal_log_hints=off track_commit_timestamp=off |
> 0/01763410 | 0/01763430 | 0/017633D8 | 0 | XLOG |
> UNKNOWN (f0) | 27 | 1 | 0 |
>
> |
> 0/01763430 | 0/01763468 | 0/01763410 | 0 | Standby |
> RUNNING_XACTS | 50 | 24 | 0 |
> nextXid 754 latestCompletedXid 753 oldestRunningXid 754
>
> |
> (3 rows)
>
> 3) Should this be the other way around? Would it be better to throw
> the error earlier, instead of waiting for the running transactions to
> finish?
> @@ -136,6 +137,9 @@ create_logical_replication_slot(char *name, char *plugin,
> temporary ?
> RS_TEMPORARY : RS_EPHEMERAL, two_phase,
> failover, false);
>
> + EnsureLogicalDecodingEnabled();
> + CheckLogicalDecodingRequirements();
>
> 4) The includes xlog_internal, xlogutils, atomics, lwlock, procsignal,
> shmem, standby and guc is not required, I was able to compile without
> it:
> + * src/backend/replication/logical/logicalctl.c
> + *
> + *-------------------------------------------------------------------------
> + */
> +#include "postgres.h"
> +
> +#include "access/xlog_internal.h"
> +#include "access/xlogutils.h"
> +#include "access/xloginsert.h"
> +#include "catalog/pg_control.h"
> +#include "port/atomics.h"
> +#include "miscadmin.h"
> +#include "storage/lwlock.h"
> +#include "storage/procarray.h"
> +#include "storage/procsignal.h"
> +#include "storage/ipc.h"
> +#include "storage/lmgr.h"
> +#include "storage/shmem.h"
> +#include "storage/standby.h"
> +#include "replication/logicalctl.h"
> +#include "replication/slot.h"
> +#include "utils/guc.h"
> +#include "utils/wait_event_types.h"
>
> 5) I felt this change is not related to this patch:
> @@ -1144,7 +1152,7 @@ slotsync_reread_config(void)
> if (old_sync_replication_slots != sync_replication_slots)
> {
> ereport(LOG,
> - /* translator: %s is a GUC variable name */
> + /* translator: %s is a GUC variable name */
> errmsg("replication slot
> synchronization worker will shut down because \"%s\" is disabled",
> "sync_replication_slots"));
> proc_exit(0);
>
> 6) Can we include the high level design in the commit message and also
> the other possible designs that were considered before finalizing on
> this, it will help new reviewers to get a head start as the thread is
> a long thread.
>
> 7) I did not see documentation added, can we add the required
> documentation for this.
>
> 8) The new test file added should be included in meson.build file
>
Thank you for reviewing the patch! These comments have been addressed
in the latest patch I've just submitted[1].
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-08-01 23:25:17 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Previous Message | Masahiko Sawada | 2025-08-01 23:23:16 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |