Re: Issue with logical replication slot during switchover

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: Issue with logical replication slot during switchover
Date: 2025-10-31 09:28:29
Message-ID: CAFh8B=kH4Nwd_69fWP4VxK9tjxxBUzdxEZLd+LCby9tbSMTcRA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, 31 Oct 2025 at 09:16, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
wrote:

> Hi,
> I indeed proposed a solution at the top of this thread to modify only the
> value of the synced attribute, but the discussion was redirected to adding
> an extra parameter to the function *pg_create_logical_replication_slot() *to
> overwrite a failover slot
>
>> We had discussed this point in another thread, please see [1]. After
>> discussion it was decided to not go this way.
>>
>> [1]:
>> https://www.postgresql.org/message-id/OS0PR01MB57161FF469DE049765DD53A89475A%40OS0PR01MB5716.jpnprd01.prod.outlook.com
>
>

I’ve read through the referenced discussion, and my impression is that we
might be trying to design a solution around assumptions that are unlikely
to hold in practice.
There was an argument that at some point we might allow creating logical
failover slots on cascading standbys. However, if we consider all practical
scenarios, it seems very unlikely that such a feature could work reliably
with the current design.
Let me try to explain why.

Consider the following setup:
node1 - primary
node2 - standby, replicating from node1
node3 - standby, replicating from node1, has logical slot foo
(failover=true, synced=false)
node4 - standby, replicating from node3, has logical slot foo
(failover=true, synced=true)

1) If node1 fails, we could promote either node2 or node3:
1.a) If we promote node2, we must first create a physical slot for node3,
update primary_conninfo on node3 to point to node2, wait until node3
connects, and until catalog_xmin on the physical slot becomes non-NULL.
Only then would it be safe to promote node2. This introduces unnecessary
steps, complexity, and waiting — increasing downtime, which defeats the
goal of high availability.
1.b) If we promote node3, promotion itself is fast, but subscribers will
still be using the slot on the original primary. This again defeats the
purpose of doing logical replication from a standby, and it won’t be
possible to switch subscribers to node4 (see below).
2) If node3 fails, we might want to replace it with node4. But node4 has a
slot with failover=true and synced=true, and synced=true prevents it from
being used for streaming because it’s a standby.

In other words, with the current design, allowing creation of logical
failover slots on standbys doesn’t bring any real benefit — such “synced”
slots can’t actually be used later.

One could argue that we could add a function to switch synced=true->false
on a standby, but that would just add another workaround on top of an
already fragile design, increasing operational complexity without solving
the underlying issue.

The same applies to proposals like allow_overwrite. If such a flag is
introduced, in practice it will almost always be used unconditionally, e.g.:
SELECT pg_create_logical_replication_slot('<name>', '<plugin>', failover :=
true, allow_overwrite := true);

Right now, logical failover slots can’t be reused after a switchover, which
is a perfectly normal operation.
The only current workaround is to detect standbys with failover=true,
synced=false and drop those slots, hoping they’ll be resynchronized. But
resynchronization is asynchronous, unpredictable, and may take an unbounded
amount of time. If the primary fails during that window, there might be no
standby with ready logical slots.

Instead of dropping such slots, what we actually need is a way to safely
set synced=false->true and continue operating.

Operating logical replication setups is already extremely complex and
error-prone — this is not theoretical, it’s something many of us face daily.
So rather than adding more speculative features or workarounds, I think we
should focus on addressing real operational pain points and the
inconsistencies in the current design.

A slot created on the primary (which later becomes a standby) with
failover=true has a very clear purpose. The failover flag already indicates
that purpose; synced shouldn’t override it.

Regards,
--
Alexander Kukushkin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-10-31 09:40:41 Re: Remaining dependency on setlocale()
Previous Message Filip Janus 2025-10-31 09:26:01 Re: Channel binding for post-quantum cryptography