| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Date: | 2025-11-12 02:05:02 |
| Message-ID: | CAD21AoC5FktoPX6xrzGnoNqR=ViQrJmYSRB-7RA+fDh84AgPzg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Nov 11, 2025 at 2:33 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Nov 11, 2025 at 9:35 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > Thanks for the patch. Please find a few comments:
> >
> >
> > 1)
> > ReplicationSlotsDropDBSlots:
> >
> > + SpinLockAcquire(&s->mutex);
> > + invalidated = s->data.invalidated == RS_INVAL_NONE;
> > + SpinLockRelease(&s->mutex);
> > +
> > + /*
> > + * Count slots on other databases too so we can disable logical
> > + * decoding only if no slots in the cluster.
> > + */
> > + if (invalidated)
> > + n_valid_logicalslots++;
> >
> >
> > This seems confusing to me. Can we instead do:
> >
> > SpinLockAcquire(&s->mutex);
> > if (s->data.invalidated == RS_INVAL_NONE)
> > n_valid_logicalslots++;
> > SpinLockRelease(&s->mutex);
> >
>
> Or we can name the variable as slot_valid. Here, the name of the
> variable invalidated sounds inverse of its purpose.
>
> Few other comments:
> 1.
> It should be
> + * called after dropping or invalidating what might be the last logical
> + * replication slot.
> + */
> +void
> +RequestDisableLogicalDecoding(void)
> …
> …
> + * Request to disable logical decoding, even though this slot may not
> + * have been the last logical slot. The checkpointer will verify if
> + * logical decoding should actually be disabled.
> + */
> + if (is_logical)
> + RequestDisableLogicalDecoding();
>
> The above two comments don't match with each other. The first one
> seems to suggest that RequestDisableLogicalDecoding() should be called
> for the last logical slot whereas the caller says it could be called
> for non-last slots as well.
>
> 2. The comments atop ReplicationSlotCleanup() and
> ReplicationSlotsDropDBSlots() don't reflect recent changes in those
> functions.
>
> 3.
> + consistency. Conversely, if <varname>wal_level</varname> set to
>
> /set to/is set to
>
> 4.
> --- a/src/backend/postmaster/checkpointer.c
> +++ b/src/backend/postmaster/checkpointer.c
> @@ -51,6 +51,7 @@
> #include "postmaster/bgwriter.h"
> #include "postmaster/interrupt.h"
> #include "replication/syncrep.h"
> +#include "replication/logicalctl.h"
>
> Here, the include ordering is incorrect (logicalctl.h should be before
> syncrep.h).
>
> 5.
> +#include "storage/ipc.h"
> +#include "storage/lmgr.h"
> +#include "storage/proc.h"
> +#include "storage/procarray.h"
> +#include "replication/logicalctl.h"
> +#include "replication/slot.h"
>
> Here, the include ordering is incorrect (replication should be before storage).
>
> 6.
> + /*
> + * This flag indicates whether logical decoding status changes are
> + * allowed. It is false during recovery and becomes true when recovery
> + * ends. Even when true, it specifically means "allowed after recovery has
> + * fully completed".
> + *
> + * This flag helps prevent race conditions with the startup process's
> + * end-of-recovery actions. After the startup process updates the logical
> + * decoding status at recovery end, other processes might attempt to
> + * toggle logical decoding before recovery fully completes (i.e.,
> + * RecoveryInProgress() returns false) - a period when WAL writes are
> + * still not permitted. Therefore, when this flag is true, we must wait
> + * for recovery to fully complete before attempting an activation or a
> + * deactivation.
> + */
> + bool status_change_allowed;
>
> This is a good explanation of this flag but it would be better if we
> could additionally explain why we can't rely on end-of-recovery to
> allow toggling of logical decoding status? I think some reference or
> mention of the same is required in the below place.
>
> +/*
> + * Update the logical decoding status at end of the recovery. This function
> + * must be called before accepting writes.
> + */
> +void
> +UpdateLogicalDecodingStatusEndOfRecovery(void)
>
> 7.
> If
> + * logical decoding was enabled before the last shutdown, it remains
> + * enabled as we might have set wal_level='logical' or have a few logical
> + * slots.
> + */
> + UpdateLogicalDecodingStatus(last_status, false);
>
> How about making the last part of the comment a bit more precise by
> changing it to the following?
> /or have a few logical slots./or have at least one logical slot.
Agreed, and fixed the above points.
>
> 8.
> +start_logical_decoding_status_change(bool new_status)
> {
> …
> + if (!new_status && CheckLogicalSlotExists())
> + {
> + LogicalDecodingCtl->pending_disable = false;
> + LWLockRelease(LogicalDecodingControlLock);
> + return false;
> + }
> …
> …
> + /* Return if we don't need to change the status */
> + if (LogicalDecodingCtl->logical_decoding_enabled == new_status)
> + {
> + LogicalDecodingCtl->pending_disable = false;
> + LWLockRelease(LogicalDecodingControlLock);
> + return false;
> + }
>
> Checking the existence of logical slots could be costlier, so I think
> it would be better to first check whether we really need to change the
> status.
Yes, it makes sense to check first if we really need to change the
status after waiting for a concurrent status change to complete.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2025-11-12 02:05:41 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Previous Message | Peter Smith | 2025-11-12 02:01:24 | Re: pg_createsubscriber --dry-run logging concerns |