Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-07-21 05:59:53
Message-ID: CAFiTN-vASQk815N9xEdSACGdSYOmWWEq1cmBDW9g0NMmNH6sog@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 21, 2025 at 10:08 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Sat, Jul 19, 2025 at 5:10 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jul 18, 2025 at 11:31 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jul 18, 2025 at 11:25 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > Okay. I see your point. Yes, it was non-blocking earlier but it was
> > > > not giving ERROR, it was just dumping in logilfe that primary is
> > > > behind and thus slot-sync could not be done.
> > > >
> > > > If we continue using the non-blocking mode, there’s a risk that the
> > > > API may never successfully sync the slots. This is because it
> > > > eventually drops the temporary slot on exit, and when it tries to
> > > > create a new one later on subsequent call, it’s likely that the new
> > > > slot will again be ahead of the primary. This may happen if we have
> > > > continuous ongoing writes on the primary and the logical slot is not
> > > > being consumed at the same pace.
> > > >
> > > > My preference would be to avoid including such an option as it is
> > > > confusing. With such an option in place, users may think that
> > > > slot-sync is completed while that may not be the case.
> > >
> > > Fair enough
> > >
> >
> > I think if we want we may return bool and return false when sync is
> > not complete say due to promotion or other reason like timeout.
> > However, at this stage it is not very clear whether it will be useful
> > to provide additional timeout parameter. But we can consider retruning
> > true/false depending on whether we are successful in syncing the slots
> > or not.
>
> I am not very sure if in the current scenario, such a return-value
> will have any value addition. Since this function will be waiting
> indefinitely until all the slots are synced, it is supposed to return
> true in such normal scenarios. If it is interrupted by promotion or
> user cancels it manually, then it is supposed to return false. But in
> those cases, a more helpful approach would be to log a clear WARNING
> or ERROR message like "sync interrupted by promotion" (or similar
> reasons), rather than relying on a return value. In future, if we plan
> to add a timeout-parameter, then this return value makes more sense as
> in normal scenarios as well, as it can easily return false if the
> timeout value is short or the number of slots are huge or are stuck
> waiting on primary.

> Additionally, if we do return a value, there may be an expectation
> that the API should also provide details on the list of slots that
> couldn't be synced. That could introduce unnecessary complexity at
> this stage. We can avoid it for now and consider adding such
> enhancements later if we receive relevant customer feedback. Please
> note that our recommended approach for syncing slots still remains the
> 'slot sync worker' method.

+1

--
Regards,
Dilip Kumar
Google

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2025-07-21 06:16:24 Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput
Previous Message Amit Kapila 2025-07-21 05:55:42 Re: Improve pg_sync_replication_slots() to wait for primary to advance