Re: Using failover slots for PG-non_PG logical replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Using failover slots for PG-non_PG logical replication
Date: 2025-07-09 03:00:38
Message-ID: CAJpy0uBb1RNFFP59Z7hUb=aW6AF+=AQnJep7cNAt9tKu9NC7Jw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 8, 2025 at 7:29 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, Jul 8, 2025 at 4:03 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Mon, Jul 7, 2025 at 7:00 PM Ashutosh Bapat
> > <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jul 7, 2025 at 9:53 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > Thanks for the patch.
> > > >
> > > > > I couldn't figure out whether the query on primary to fetch all the
> > > > > slots to be synchronized should filter based on invalidation_reason
> > > > > and conflicting or not. According to synchronize_slots(), it seems
> > > > > that we retain invalidated slots on standby when failover = true and
> > > > > they would remain with synced = true on standby. Is that right?
> > > > >
> > > >
> > > > Yes, that’s correct. We sync the invalidation status of replication
> > > > slots from the primary to the standby and then stop synchronizing any
> > > > slots that have been marked as invalidated, retaining synced flag as
> > > > true. IMO, there's no need to filter out conflicting slots on the
> > > > primary, because instead of excluding them there and showing
> > > > everything as failover-ready on the standby, the correct approach is
> > > > to reflect the actual state on standby.This means conflicting slots
> > > > will appear as non-failover-ready on the standby. That’s why Step 3
> > > > also considers conflicting flag in its evaluation.
> > >
> > > Thanks for the explanation. WFM.
> > >
> > > If a slot is invalidated because RS_INVAL_WAL_REMOVED, conflicting =
> > > false, but the slot is not useful standby. But then it's not useful on
> > > primary as well. Is that why we are not including (invalidation_reason
> > > IS NOT NULL) condition along in (synced AND NOT temporary AND NOT
> > > conflicting)?
> >
> > Thanks for bringing it up. Even if the slot is not useful on the
> > primary node as well, we shall still show failover-ready as false on
> > standby. We should modify the query of step3 to check
> > 'invalidation_reason IS NULL' instead of 'NOT conflicting'. That will
> > cover all the cases where the slot is invalidated and thus not
> > failover ready.
>
> Thanks for the confirmation.
>
> >
> > > >
> > > > 1)
> > > > +/* primary # */ SELECT array_agg(quote_literal(r.slot_name)) AS slots
> > > > + FROM pg_replication_slots r
> > > > + WHERE r.failover AND NOT r.temporary;
> > > >
> > > > On primary, there is no need to check temporary-status. We do not
> > > > allow setting failover as true for temporary slots.
> > >
> > > Why does synchronize_slots() has it in its query?
> > >
> >
> > It is not needed but no harm in maintaining it.
> > If we want documents to be in sync with code, we can have 'not
> > temporary' check in doc as well.
> >
>
> I think it's better to keep the code and the doc in sync otherwise we
> developers would get confused.
>
> > > >
> > > > 2)
> > > > Although not directly related to the concerns addressed in the given
> > > > patch, I think it would be helpful to add a note in the original doc
> > > > stating that Steps 1 and 2 should be executed on each subscriber node
> > > > that will be served by the standby after failover.
> > >
> > > There's a bit of semantic repeatition here since an earlier paragraph
> > > mentions a given subscriber. But I think overall it's better to be
> > > more clear than being less clear.
> > > >
> > > > I have attached a top-up patch with the above changes and a few more
> > > > trivial changes. Please include it if you find it okay.
> > >
> > > Thanks. Included. I have made a few edits and included them in the
> > > attached patch.
> > >
> >
> > Thanks. The existing changes (originally targeted in this patch) look
> > good to me.
> >
> > I have attached a top-up patch for step-3 correction. Please include
> > if you find it okay to be fixed in the same patch, otherwise we can
> > handle it separately.
>
> I have split your top up patch into 2 - one related to the document
> change being the subject of this thread and the other for fixing the
> query. Committer may squash the patch, if they think so.
>

The changes look good to me.

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2025-07-09 03:04:55 Re: Can can I make an injection point wait occur no more than once?
Previous Message Michael Paquier 2025-07-09 02:45:54 Re: A assert failure when initdb with track_commit_timestamp=on