Re: Issue with logical replication slot during switchover

From: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: Issue with logical replication slot during switchover
Date: 2025-10-31 16:45:33
Message-ID: CAA5-nLAPrE729BiCGKKUU_9b+CA2nxrKLyc-W+SbmU2ojFeehQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for your large analyze and explanation Alexander. If we go in the
direction you propose and leave the option to use a suplementary flag
allow_overwrite, may I propose you some modifications in the patch v0 you
have attached:

Why modifying this function?

drop_local_obsolete_slots(List *remote_slot_list)

List *local_slots = get_local_synced_slots(); => as the failover slot
we check has synced = false then it wont' enter the loop and dropping the
slot

If we want to resynchronize the slot properly why not to drop it and
recreate cleanly in place of putting the synced flag to true although the
slot is not actually synchronized. Here is the code I wrote in the patch
version 6 with the check on the failover flag.

retry:
/* Search for the named slot */
if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool synced;
bool failover;

SpinLockAcquire(&slot->mutex);
synced = slot->data.synced;
failover = slot->data.failover;
SpinLockRelease(&slot->mutex);

if (!synced)
{
/* User-created slot with the same name exists, raise ERROR. */
if (!failover)
ereport(ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exiting from slot synchronization because same"
" name slot \"%s\" already exists on the standby",
remote_slot->name));

/*
* At some point we were a primary, and it was expected to have
* synced = false and failover = true. To resynchronize the slot we could
* drop it and replay the code for the slot to be recreated cleanly.
*/
ereport(LOG,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("slot \"%s\" already exists"
" on the standby but will be drop and recreated to be
resynchronized",
remote_slot->name));

/* Get rid of a replication slot that is no longer wanted */
ReplicationSlotAcquire(remote_slot->name, true, false);
ReplicationSlotDropAcquired();
goto retry;
}

Thanks four your appreciation and feedback

Regards,

Fabrice

On Fri, Oct 31, 2025 at 10:28 AM Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
wrote:

> Hi,
>
> On Fri, 31 Oct 2025 at 09:16, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
> wrote:
>
>> Hi,
>> I indeed proposed a solution at the top of this thread to modify only the
>> value of the synced attribute, but the discussion was redirected to adding
>> an extra parameter to the function *pg_create_logical_replication_slot()
>> *to overwrite a failover slot
>>
>>> We had discussed this point in another thread, please see [1]. After
>>> discussion it was decided to not go this way.
>>>
>>> [1]:
>>> https://www.postgresql.org/message-id/OS0PR01MB57161FF469DE049765DD53A89475A%40OS0PR01MB5716.jpnprd01.prod.outlook.com
>>
>>
>
> I’ve read through the referenced discussion, and my impression is that we
> might be trying to design a solution around assumptions that are unlikely
> to hold in practice.
> There was an argument that at some point we might allow creating logical
> failover slots on cascading standbys. However, if we consider all practical
> scenarios, it seems very unlikely that such a feature could work reliably
> with the current design.
> Let me try to explain why.
>
> Consider the following setup:
> node1 - primary
> node2 - standby, replicating from node1
> node3 - standby, replicating from node1, has logical slot foo
> (failover=true, synced=false)
> node4 - standby, replicating from node3, has logical slot foo
> (failover=true, synced=true)
>
> 1) If node1 fails, we could promote either node2 or node3:
> 1.a) If we promote node2, we must first create a physical slot for node3,
> update primary_conninfo on node3 to point to node2, wait until node3
> connects, and until catalog_xmin on the physical slot becomes non-NULL.
> Only then would it be safe to promote node2. This introduces unnecessary
> steps, complexity, and waiting — increasing downtime, which defeats the
> goal of high availability.
> 1.b) If we promote node3, promotion itself is fast, but subscribers will
> still be using the slot on the original primary. This again defeats the
> purpose of doing logical replication from a standby, and it won’t be
> possible to switch subscribers to node4 (see below).
> 2) If node3 fails, we might want to replace it with node4. But node4 has a
> slot with failover=true and synced=true, and synced=true prevents it from
> being used for streaming because it’s a standby.
>
> In other words, with the current design, allowing creation of logical
> failover slots on standbys doesn’t bring any real benefit — such “synced”
> slots can’t actually be used later.
>
> One could argue that we could add a function to switch synced=true->false
> on a standby, but that would just add another workaround on top of an
> already fragile design, increasing operational complexity without solving
> the underlying issue.
>
> The same applies to proposals like allow_overwrite. If such a flag is
> introduced, in practice it will almost always be used unconditionally, e.g.:
> SELECT pg_create_logical_replication_slot('<name>', '<plugin>', failover
> := true, allow_overwrite := true);
>
> Right now, logical failover slots can’t be reused after a switchover,
> which is a perfectly normal operation.
> The only current workaround is to detect standbys with failover=true,
> synced=false and drop those slots, hoping they’ll be resynchronized. But
> resynchronization is asynchronous, unpredictable, and may take an unbounded
> amount of time. If the primary fails during that window, there might be no
> standby with ready logical slots.
>
> Instead of dropping such slots, what we actually need is a way to safely
> set synced=false->true and continue operating.
>
> Operating logical replication setups is already extremely complex and
> error-prone — this is not theoretical, it’s something many of us face daily.
> So rather than adding more speculative features or workarounds, I think we
> should focus on addressing real operational pain points and the
> inconsistencies in the current design.
>
> A slot created on the primary (which later becomes a standby) with
> failover=true has a very clear purpose. The failover flag already indicates
> that purpose; synced shouldn’t override it.
>
> Regards,
> --
> Alexander Kukushkin
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2025-10-31 17:11:01 Re: Calling PGReserveSemaphores() from CreateOrAttachShmemStructs
Previous Message Heikki Linnakangas 2025-10-31 16:40:03 Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue