Re: Issue with logical replication slot during switchover

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: Issue with logical replication slot during switchover
Date: 2025-11-20 20:34:30
Message-ID: CAD21AoDH+Nv62qoJ8P+2249u=HTyrWg3qGuk9vqOb014=zzWzg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 20, 2025 at 6:26 AM Fabrice Chapuis <fabrice636861(at)gmail(dot)com> wrote:
>
> > I think we need to clarify that suppose the standby has a slot with
> > failover=true and synced=false and the primary has the slot with the
> > same name, failover=true, and synced=true...
> I'm not sure to understand the semantics related to the `synced` flag but why `synced` flag can be true on a primary instance? AFAICS if `synced=true` then it means taht the slot is inactive and it is synchronized with a slot on a remote instance. On a primary, what is the meaning of having the flag synced set to true?

I think that the synced can be true on the primary if the slot was
previously synced and the instance is now working as the primary. But
the synced flag being true doesn't mean anything on the primary. It
works only on the standby.

> There's already an open thread dealing with this issue [1].
> The problem I see is being able to distinguish between 2 situations:
> 1) A failover slot has been created on a standby (failover=true and synced=false) in a context of cascading standby. In this case the slot must not be deleted.
> 2) A former primary has a slot (failover=true and synced=false) that must be resynchronized and that can be overwritten.

Right.

> Why not to use a slot's metadata (allow_overwrite) to treat these two situations separately.

I'm not sure that the allow_overwrite idea is the best approach. For
example, suppose that in a cascading replication setup (node-1 ->
node2 -> node3) we create a failover slot on node2 (failover=true,
synced=false, and allow_overwrite=false), the slot is synchronized to
the node3 (failover=true, synced=true, allow_overwrite=false). If we
do a switchover between node2 and node3, node3 joins the primary,
node1, and node2 now joins node3 as a cascaded standby (i.e.,
replication setup is now node1 -> node3 -> node2). I guess that in
this case the slot on node2 wants to be overwritten by the one on the
node3, but it's not allowed because the slot on node2 has
allow_overwrite=false.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Burd 2025-11-20 20:45:22 [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers
Previous Message Bruce Momjian 2025-11-20 20:31:48 Re: 10% drop in code line count in PG 17