Re: Issue with logical replication slot during switchover

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: Issue with logical replication slot during switchover
Date: 2025-11-17 10:40:09
Message-ID: CAFh8B=nJg2xinHYF2NWL_mt3E9gc6_JqaUVu+eoDYBuP9VKL3A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Masahiko,

On Fri, 14 Nov 2025 at 23:32, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> Given the current behavior that we cannot create a logical slot with
> failover=true on the standby, it makes sense to me that we overwrite
> the pre-existing slot (with synced=false and failover=true) on the old
> primary by the slot (with synced=true and failover=true) on the new
> primary if their names, plugin and other properties matches and the
> pre-existing slot has lesser LSNs and XIDs than the one on the new
> primary.

From one side the idea to have additional checks looks reasonable, but if I
look at existing update_local_synced_slot() function, I find the following:
if (remote_dbid != slot->data.database ||
remote_slot->two_phase != slot->data.two_phase ||
remote_slot->failover != slot->data.failover ||
strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) != 0 ||
remote_slot->two_phase_at != slot->data.two_phase_at)
{
NameData plugin_name;

/* Avoid expensive operations while holding a spinlock. */
namestrcpy(&plugin_name, remote_slot->plugin);

SpinLockAcquire(&slot->mutex);
slot->data.plugin = plugin_name;
slot->data.database = remote_dbid;
slot->data.two_phase = remote_slot->two_phase;
slot->data.two_phase_at = remote_slot->two_phase_at;
slot->data.failover = remote_slot->failover;
SpinLockRelease(&slot->mutex);

That is, if some synced slot properties on standby don't match with the
primary we simply overwrite them.
I guess this is necessary because synchronization happens only
periodically, and between two runs a slot on the primary might have been
recreated with different properties.
Do we really need to have additional checks to flip a synced flag?

Regards,
--
Alexander Kukushkin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2025-11-17 10:50:01 RE: Newly created replication slot may be invalidated by checkpoint
Previous Message Chao Li 2025-11-17 10:33:26 Re: CREATE/ALTER PUBLICATION improvements for syntax synopsis