Re: Issue with logical replication slot during switchover

From: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: Issue with logical replication slot during switchover
Date: 2025-11-11 15:56:56
Message-ID: CAA5-nLASa+dhSXkifQJgisBB+c_pZyN_faYmH0nrEy05CSJoGQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,

if I resume your scenario
1. A standby S has a failover slot slot1 synchronized with slot1 on primary
P
2. We promote S
3. On P we drop slot1 and create slot1 again with failover mode (a
subscriber exist on another instance by example)
4. A rewind is performed on P the former primary to rejoin S the former
standby
5. On P slot1 is automatically dropped and recreated to be synchronized

In which context this kind of scenario could happend?

Isn't the goal to find a solution for a switchover which is carried out for
maintenance on a Postgres cluster, the aim is to find a compromise to cover
the most likely scenarios.
Do you think we must come back to the allow_overwrite flag approach or
another solution?

Best Regards,

Fabrice

On Mon, Nov 10, 2025 at 1:10 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Fri, Oct 31, 2025 at 2:58 PM Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
> wrote:
> >
> > Instead of dropping such slots, what we actually need is a way to safely
> set synced=false->true and continue operating.
> >
> > Operating logical replication setups is already extremely complex and
> error-prone — this is not theoretical, it’s something many of us face daily.
> > So rather than adding more speculative features or workarounds, I think
> we should focus on addressing real operational pain points and the
> inconsistencies in the current design.
> >
> > A slot created on the primary (which later becomes a standby) with
> failover=true has a very clear purpose. The failover flag already indicates
> that purpose; synced shouldn’t override it.
> >
>
> I think this is not as clear as you are saying as compared to WAL. In
> failover cases, we bump the WAL timelines on new primary and also have
> facilities like pg_rewind to ensure that old primary can follow the
> new primary after divergence. For slots, there is no such facility,
> now, there is an argument that for slot's it is sufficient to match
> the name and failover to say that it is okay to overwrite the slot on
> old primary. However, it is not clear whether it is always safe to do
> so, for example, if the old primary ran after divergence for sometime
> and one has re-created the slot with same name and failover property,
> it will no longer be the same slot. Unlike WAL, we don't maintain the
> slot's history, so it is not equally clear that we can overwrite old
> primary's slot's as it is.
>
> --
> With Regards,
> Amit Kapila.
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florin Irion 2025-11-11 16:14:42 Re: [PATCH] pg_get_domain_ddl: DDL reconstruction function for CREATE DOMAIN statement
Previous Message Vaibhav Dalvi 2025-11-11 15:51:25 Re: [PATCH] Add pg_get_subscription_ddl() function