Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-10-24 03:16:19
Message-ID: CAJpy0uCKj33a-WzRq74-UXOW5d-gSTxbm9hpmrtcCWpXhG32iQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 24, 2025 at 12:23 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Oct 23, 2025 at 3:39 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > Please find a few comments:
> >
> > 1)
> > RequestDisableLogicalDecoding:
> > Shall we have 'Assert(!MyReplicationSlot)' here to ensure that this
> > function is invoked after we have released and dropped the slot. This
> > is similar to 'Assert(MyReplicationSlot)' in
> > EnsureLogicalDecodingEnabled().
>
> I think the reason why we need 'Assert(MyReplicationSlot)' in
> EnsureLogicalDecodingEnabled() is that otherwise the caller cannot get
> the expected result. IOW, suppose w can call
> EnsureLogicalDecodingEnabled() without holding a logical slot, logical
> decoding is not enabled if all logical slots are dropped concurrently
> in spite of the function name having the term "Ensure". As for
> RequestDisableLogicalDecoding(), it can request to disable logical
> decoding even while holding a logical slot. It might practically make
> less sense but it's no problem in principle.

Okay.

> I guess it makes sense to
> put the assertion into DisableLogicalDecodingIfNecessary() instead.
>

I thought there is no chance here that the checkpointer is holding any
slot when it calls DisableLogicalDecodingIfNecessary() and thus Assert
is not needed there. But that said, it is no harm if we have it there.

> >
> > 2)
> > EnsureLogicalDecodingEnabled sets status_change_inprogress to false
> > before writing the WAL record. Initially, I thought this could lead to
> > a situation where another session might drop the same slot, since
> > there’s nothing preventing it (as status_change_inprogress is false
> > and LogicalDecodingControlLock has been released). This could, in
> > theory, result in the checkpointer writing the WAL record that
> > disables logical decoding before EnsureLogicalDecodingEnabled() writes
> > its WAL record that enables it — potentially causing an issue. But
> > this problem could not be reproduced in practice, since the slot was
> > acquired by session1, and therefore another session attempting to drop
> > it couldn’t acquire it. That said, I still lean towards setting
> > status_change_inprogress = false after the WAL record has been written
> > in EnsureLogicalDecodingEnabled(). Thoughts?
> > If not, we could add a comment explaining why this scenario is not a problem.
>
> Good point. I'll incorporate your suggestion.
>
> >
> > 3)
> > + # Drop the logical slot, requesting to disable logical decoding to
> > the checkpointer.
> > + # It has to wait for the recovery to complete before disabling
> > logical decoding.
> > + $standby5->safe_psql('postgres',
> > + qq[select pg_drop_replication_slot('standby5_slot');]);
> > +
> > + # Resume the startup process to complete the recovery.
> > + $standby5->safe_psql('postgres',
> > + qq[select injection_points_wakeup('startup-logical-decoding-status-change-end-of-recovery')]
> > + );
> > +
> > + $standby5->wait_for_log(
> > + "waiting for recovery completion to change logical decoding status");
> >
> > Shouldn’t we check the log for "waiting for recovery completion..."
> > before triggering injection_points_wakeup?
> >
> > IIUC, the current order may cause intermittent failures. Imagine that
> > drop-slot has not yet reached the LogicalDecodingStatusChangeAllowed
> > and RecoveryInProgress checks, and we release the injection point in
> > the meantime. In that case, drop-slot may never end up waiting, and we
> > might not see the expected log message.
>
> Agreed.
>
> I'll self-review the patch again and share the updated version patch.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2025-10-24 03:16:39 Re: Can we use Statistics Import and Export feature to perforamance testing?
Previous Message Gregory Smith 2025-10-24 03:03:49 PG18 GIN parallel index build crash - invalid memory alloc request size