Re: Using failover slots for PG-non_PG logical replication

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Subject: Re: Using failover slots for PG-non_PG logical replication
Date: 2025-07-07 13:30:08
Message-ID: CAExHW5v-3gAMMOQOwF5RpsrGa=E7Y5BsFBePN_ndmWTWxPEZFw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 7, 2025 at 9:53 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> Thanks for the patch.
>
> > I couldn't figure out whether the query on primary to fetch all the
> > slots to be synchronized should filter based on invalidation_reason
> > and conflicting or not. According to synchronize_slots(), it seems
> > that we retain invalidated slots on standby when failover = true and
> > they would remain with synced = true on standby. Is that right?
> >
>
> Yes, that’s correct. We sync the invalidation status of replication
> slots from the primary to the standby and then stop synchronizing any
> slots that have been marked as invalidated, retaining synced flag as
> true. IMO, there's no need to filter out conflicting slots on the
> primary, because instead of excluding them there and showing
> everything as failover-ready on the standby, the correct approach is
> to reflect the actual state on standby.This means conflicting slots
> will appear as non-failover-ready on the standby. That’s why Step 3
> also considers conflicting flag in its evaluation.

Thanks for the explanation. WFM.

If a slot is invalidated because RS_INVAL_WAL_REMOVED, conflicting =
false, but the slot is not useful standby. But then it's not useful on
primary as well. Is that why we are not including (invalidation_reason
IS NOT NULL) condition along in (synced AND NOT temporary AND NOT
conflicting)?

>
> 1)
> +/* primary # */ SELECT array_agg(quote_literal(r.slot_name)) AS slots
> + FROM pg_replication_slots r
> + WHERE r.failover AND NOT r.temporary;
>
> On primary, there is no need to check temporary-status. We do not
> allow setting failover as true for temporary slots.

Why does synchronize_slots() has it in its query?

>
> 2)
> Although not directly related to the concerns addressed in the given
> patch, I think it would be helpful to add a note in the original doc
> stating that Steps 1 and 2 should be executed on each subscriber node
> that will be served by the standby after failover.

There's a bit of semantic repeatition here since an earlier paragraph
mentions a given subscriber. But I think overall it's better to be
more clear than being less clear.
>
> I have attached a top-up patch with the above changes and a few more
> trivial changes. Please include it if you find it okay.

Thanks. Included. I have made a few edits and included them in the
attached patch.

--
Best Wishes,
Ashutosh Bapat

Attachment Content-Type Size
0001-Clarify-logical-replication-failover-docume-20250707.patch text/x-patch 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2025-07-07 13:38:58 Re: Consider explicit incremental sort for Append and MergeAppend
Previous Message Nazir Bilal Yavuz 2025-07-07 13:19:39 Re: C11 / VS 2019