Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-10-23 18:52:54
Message-ID: CAD21AoCiihiYfUk_8Wnb-FxbFY2HXr-NjD3g2aL=0naLfjxcPA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 23, 2025 at 3:39 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> Please find a few comments:
>
> 1)
> RequestDisableLogicalDecoding:
> Shall we have 'Assert(!MyReplicationSlot)' here to ensure that this
> function is invoked after we have released and dropped the slot. This
> is similar to 'Assert(MyReplicationSlot)' in
> EnsureLogicalDecodingEnabled().

I think the reason why we need 'Assert(MyReplicationSlot)' in
EnsureLogicalDecodingEnabled() is that otherwise the caller cannot get
the expected result. IOW, suppose w can call
EnsureLogicalDecodingEnabled() without holding a logical slot, logical
decoding is not enabled if all logical slots are dropped concurrently
in spite of the function name having the term "Ensure". As for
RequestDisableLogicalDecoding(), it can request to disable logical
decoding even while holding a logical slot. It might practically make
less sense but it's no problem in principle. I guess it makes sense to
put the assertion into DisableLogicalDecodingIfNecessary() instead.

>
> 2)
> EnsureLogicalDecodingEnabled sets status_change_inprogress to false
> before writing the WAL record. Initially, I thought this could lead to
> a situation where another session might drop the same slot, since
> there’s nothing preventing it (as status_change_inprogress is false
> and LogicalDecodingControlLock has been released). This could, in
> theory, result in the checkpointer writing the WAL record that
> disables logical decoding before EnsureLogicalDecodingEnabled() writes
> its WAL record that enables it — potentially causing an issue. But
> this problem could not be reproduced in practice, since the slot was
> acquired by session1, and therefore another session attempting to drop
> it couldn’t acquire it. That said, I still lean towards setting
> status_change_inprogress = false after the WAL record has been written
> in EnsureLogicalDecodingEnabled(). Thoughts?
> If not, we could add a comment explaining why this scenario is not a problem.

Good point. I'll incorporate your suggestion.

>
> 3)
> + # Drop the logical slot, requesting to disable logical decoding to
> the checkpointer.
> + # It has to wait for the recovery to complete before disabling
> logical decoding.
> + $standby5->safe_psql('postgres',
> + qq[select pg_drop_replication_slot('standby5_slot');]);
> +
> + # Resume the startup process to complete the recovery.
> + $standby5->safe_psql('postgres',
> + qq[select injection_points_wakeup('startup-logical-decoding-status-change-end-of-recovery')]
> + );
> +
> + $standby5->wait_for_log(
> + "waiting for recovery completion to change logical decoding status");
>
> Shouldn’t we check the log for "waiting for recovery completion..."
> before triggering injection_points_wakeup?
>
> IIUC, the current order may cause intermittent failures. Imagine that
> drop-slot has not yet reached the LogicalDecodingStatusChangeAllowed
> and RecoveryInProgress checks, and we release the injection point in
> the meantime. In that case, drop-slot may never end up waiting, and we
> might not see the expected log message.

Agreed.

I'll self-review the patch again and share the updated version patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2025-10-23 19:21:35 Use log_newpage_range in HASH index build
Previous Message Nathan Bossart 2025-10-23 18:47:51 Re: another autovacuum scheduling thread