Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Yilin Zhang <jiezhilove(at)126(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2026-03-05 06:55:42
Message-ID: CAA4eK1+-X7te_cBna2P3OA1ojcberkgyh1rWYKbpnG5T_DkbuA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 4, 2026 at 1:06 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Mar 4, 2026 at 12:26 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Tuesday, February 17, 2026 12:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > On Tue, Feb 17, 2026 at 9:13 AM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > On Mon, Feb 16, 2026 at 4:35 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > wrote:
> > > > >
> > > > > On Fri, Feb 13, 2026 at 7:54 AM Zhijie Hou (Fujitsu)
> > > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > > >
> > > > > > Thanks for pushing! Here are the remaining patches.
> > > > > >
> > > > >
> > > > > One thing that bothers me about the remaining patch is that it could
> > > > > lead to infinite re-tires in the worst case. For example, in first
> > > > > try, slot-1 is not synced say due to physical replication delays in
> > > > > flushing WALs up to the confirmed_flush_lsn of that slot, then in
> > > > > next (re-)try, the same thing happened for slot-2, then in next
> > > > > (re-)try,
> > > > > slot-3 appears to invalidated on standby but it is valid on primary,
> > > > > and so on. What do you think?
> > > >
> > > > Yes, that is a possibility we cannot rule out. This can also happen
> > > > during the first invocation of the API (even without the new changes)
> > > > when we attempt to create new slots, they may remain in a temporary
> > > > state indefinitely. However, that risk is limited to the initial sync,
> > > > until the slots are persisted, which is somewhat expected behavior.
> > > >
> > >
> > > Right.
> > >
> > > > With the current changes though, the possibility of an indefinite wait
> > > > exists during every run. So the question becomes: what would be more
> > > > desirable for users -- for the API to finish with the risk that a few
> > > > slots are not synced, or for the API to wait longer to ensure that all
> > > > slots are properly synced?
> > > >
> > > > I think that if the primary use case of this API is when a user plans
> > > > to run it before a scheduled failover, then it would be better for the
> > > > API to wait and ensure everything is properly synced.
> > > >
> > >
> > > I don't think we can guarantee that all slots are synced as per latest primary
> > > state in one invocation because some newly created slots can anyway be
> > > missed. So why take the risk of infinite waits in the API? I think it may be
> > > better to extend the usage of this API (probably with more parameters) based
> > > on more user feedback.
> > >
> >
> > I agree that we could wait for more user feedback before extending the
> > retry logic to persisted slots.
> >
> > > > > Independent of whether we consider the entire patch, the following
> > > > > bit in the patch in useful as we retry to sync the slots via API.
> > > > > @@ -218,7 +219,7 @@ update_local_synced_slot(RemoteSlot
> > > > > *remote_slot, Oid remote_dbid)
> > > > > * Can get here only if GUC 'synchronized_standby_slots' on the
> > > > > * primary server was not configured correctly.
> > > > > */
> > > > > - ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
> > > > > + ereport(LOG,
> > > > > errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > > errmsg("skipping slot synchronization because the received slot sync"
> > > > > " LSN %X/%08X for slot \"%s\" is ahead of the standby position
> > > > > %X/%08X",
> > > > >
> > > >
> > > > yes. I agree.
> > > >
> > >
> > > Let's wait for Hou-San's opinion on this one.
> >
> > +1 for changing this.
> >
> > Here is the patch set to convert elevel to LOG so that the function cyclically
> > retry until the standby catches up and the slot is successfully persisted.
> >
>
> patch001 has:
>
> - logical decoding and must be dropped after promotion. See
> + logical decoding and must be dropped after promotion. This function
> + retries cyclically until all the failover slots that existed on
> + primary at the start of the function call are synchronized. See
>
> IIUC, this is not completely true though. If the slot is persisted, we
> do not try cyclically now, we skip the sync. Isn't it?
>

Right, but even if one of the slots is not persisted, it will try to
sync again, no?

Shall this be
> changed to:
>
> It retries cyclically until all the failover slots that existed on
> primary at the start of the function call are persisted and are
> sync-ready.
>

This is too internal specific. I find Hou-san's version better w.r.t
user facing docs.

* RS_TEMPORARY. Once the decoding from corresponding LSNs can reach a
* consistent point, they will be marked as RS_PERSISTENT.
*
+ * If the WAL prior to the remote slot's confirmed_flush_lsn has not been
+ * flushed on the standby, the slot is marked as RS_TEMPORARY. Once the standby
+ * catches up and flushes that WAL, the slot is promoted to RS_PERSISTENT.

BTW, I don't think we need these additional comments.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2026-03-05 07:37:10 Re: Add pg_stat_recovery system view
Previous Message jinbinge 2026-03-05 06:33:53 Re:doc: add note that wal_level=logical doesn't set up logical replication in itself