Re: Issue with logical replication slot during switchover

From: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: Issue with logical replication slot during switchover
Date: 2025-11-04 11:14:54
Message-ID: CAA5-nLC=3QMehrktEB1MOrAk_ynG_u=pD_iUOup4MmkN2Xv0sA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alexander,
I understand your use case but I think it won't work like this because of
how is implemented get_local_synced_slots()
this function will return slots which synced flag to true only.

in point 4 Start node1: synced flag is false and failover flag is true
then it won't enter the loop and the slot will not be dropped

static void
drop_local_obsolete_slots(List *remote_slot_list)
{
List *local_slots = get_local_synced_slots();

foreach_ptr(ReplicationSlot, local_slot, local_slots)
{
/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list)) => will check
of remote slot exist and if local slot is not invalidated
{
bool synced_slot;
...

We could change the function get_local_synced_slots to make it work

get_local_synced_slots(void)
{
List *local_slots = NIL;

LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);

for (int i = 0; i < max_replication_slots; i++)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];

/* Check if it is a synchronized slot */
if (s->in_use && (s->data.failover || s->data.synced))
{
Assert(SlotIsLogical(s));
local_slots = lappend(local_slots, s);

Thanks for your feedback

Regards,

Fabrice
....
On Mon, Nov 3, 2025 at 8:51 AM Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
wrote:

> Hi Fabrice,
>
> On Fri, 31 Oct 2025 at 17:45, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
> wrote:
>
>> Thanks for your large analyze and explanation Alexander. If we go in the
>> direction you propose and leave the option to use a suplementary flag
>> allow_overwrite, may I propose you some modifications in the patch v0 you
>> have attached:
>>
>> Why modifying this function?
>>
>> drop_local_obsolete_slots(List *remote_slot_list)
>>
>
> Think about the following case:
> 1. We have node1 (primary) and node2 (standby), with slot
> foo(failover=true)
> 2. We stop node1 for maintenance and promote node2
> 3. DROP replication slot on node2
> 4. Start node1
>
> In this scenario the logical slot "foo" will be left unattended and it
> needs to be dropped, because it doesn't exist on the primary.
>
> As I said earlier, IMO it is only the "failover" flag that effectively
> indicates the purpose of the slot. The sync flag is purely informative.
>
> Regards,
> --
> Alexander Kukushkin
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-11-04 11:17:02 Re: [PATCH] Add error message for out-of-memory in passwordFromFile()
Previous Message Ashutosh Bapat 2025-11-04 10:58:55 Re: Report bytes and transactions actually sent downtream