Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-10-23 10:39:39
Message-ID: CAJpy0uC2oXafcySxQZMVruRU4xBdPgJE67T08q02h4wLApKPNg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Please find a few comments:

1)
RequestDisableLogicalDecoding:
Shall we have 'Assert(!MyReplicationSlot)' here to ensure that this
function is invoked after we have released and dropped the slot. This
is similar to 'Assert(MyReplicationSlot)' in
EnsureLogicalDecodingEnabled().

2)
EnsureLogicalDecodingEnabled sets status_change_inprogress to false
before writing the WAL record. Initially, I thought this could lead to
a situation where another session might drop the same slot, since
there’s nothing preventing it (as status_change_inprogress is false
and LogicalDecodingControlLock has been released). This could, in
theory, result in the checkpointer writing the WAL record that
disables logical decoding before EnsureLogicalDecodingEnabled() writes
its WAL record that enables it — potentially causing an issue. But
this problem could not be reproduced in practice, since the slot was
acquired by session1, and therefore another session attempting to drop
it couldn’t acquire it. That said, I still lean towards setting
status_change_inprogress = false after the WAL record has been written
in EnsureLogicalDecodingEnabled(). Thoughts?
If not, we could add a comment explaining why this scenario is not a problem.

3)
+ # Drop the logical slot, requesting to disable logical decoding to
the checkpointer.
+ # It has to wait for the recovery to complete before disabling
logical decoding.
+ $standby5->safe_psql('postgres',
+ qq[select pg_drop_replication_slot('standby5_slot');]);
+
+ # Resume the startup process to complete the recovery.
+ $standby5->safe_psql('postgres',
+ qq[select injection_points_wakeup('startup-logical-decoding-status-change-end-of-recovery')]
+ );
+
+ $standby5->wait_for_log(
+ "waiting for recovery completion to change logical decoding status");

Shouldn’t we check the log for "waiting for recovery completion..."
before triggering injection_points_wakeup?

IIUC, the current order may cause intermittent failures. Imagine that
drop-slot has not yet reached the LogicalDecodingStatusChangeAllowed
and RecoveryInProgress checks, and we release the injection point in
the meantime. In that case, drop-slot may never end up waiting, and we
might not see the expected log message.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2025-10-23 10:46:27 Re: Implement waiting for wal lsn replay: reloaded
Previous Message Yugo Nagata 2025-10-23 10:27:53 Re: Can we use Statistics Import and Export feature to perforamance testing?