Quick Links

Re: Improve pg_sync_replication_slots() to wait for primary to advance

From:	shveta malik <shveta(dot)malik(at)gmail(dot)com>
To:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Yilin Zhang <jiezhilove(at)126(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject:	Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date:	2026-01-30 09:48:36
Message-ID:	CAJpy0uBFvCH-37L82U7Dff1_6vruRHJn9hv5CFyLobaXWC+kVg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jan 30, 2026 at 11:18 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, December 18, 2025 7:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Dec 17, 2025 at 3:58 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > Here is a small patch to fix it.
> > >
> >
> > Thanks, I've pushed the patch. BTW, looking at the code of slot_sync API code
> > path, I could think of the following improvements.
> >
> > *
> > if (remote_slot->confirmed_lsn > latestFlushPtr)
> > { update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
> >
> > /*
> > * Can get here only if GUC 'synchronized_standby_slots' on the
> > * primary server was not configured correctly.
> > */
> > ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
> > errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >
> > Can we change this ERROR to LOG even for API as now the API also retires to
> > sync the slots during initial sync?
> >
> > * The use of the slot_persistence_pending flag in the internal APIs seems to
> > be the reverse of what it should be. I mean to say that initially it should be
> > true and when we actually persist the slot then we can set it to false.
> >
> > * We can retry to sync all the slots present in the primary at the start of API,
> > not only temporary slots. If we do this then the previous point may not be
> > required. Also, please mention something
> > like: "It retries cyclically until all the failover slots that existed on primary at
> > the start of the function call are synchronized." in the function description [1]
> > as well.
>
> Here is the patch to address these points. The patch improves the function to
> retry for both slots that fail to persist and those persistent slots that have
> skipped subsequent synchronizations.
>

Thanks for the patch. Few comments:

1)
* If the SQL function pg_sync_replication_slots() is used to sync the slots,
* and if the slots are not ready to be synced and are marked as RS_TEMPORARY
* because of any of the reasons mentioned above, then the SQL function also
* waits and retries until the slots are marked as RS_PERSISTENT (which means
* sync-ready). Refer to the comments in SyncReplicationSlots() for more
* details.

We need to update the comment at the top of the file as well.

2)
Is there a reason we don't call should_resync_slot() in
synchronize_slots() after each synchronize_one_slot() call? If we did,
we could invoke it in a single place instead of calling it at multiple
locations as we do now.

3)
In commit message, we have mentioned pg_sync_replication_slot() at one
palce instead of pg_sync_replication_slots().

4)
In the test, can you please elaborate why we need
'synchronized_standby_slots' adjustment twice?

# Remove the standby from the synchronized_standby_slots list and reload the
# configuration.
$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
$primary->reload;

# Remove the standby from the synchronized_standby_slots and reduce the maximum
# walsender number.
$primary->append_conf(
'postgresql.conf', qq(
max_wal_senders = 2
synchronized_standby_slots = ''
));
$primary->restart;

5)
At the end of the test, where do we check that the API is no longer
continuing and has successfully finished? I see '$h->quit;', but is
that equivalent to verifying that the API is successfully over?

thanks
Shveta

In response to

RE: Improve pg_sync_replication_slots() to wait for primary to advance at 2026-01-30 05:48:06 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kirill Reshke	2026-01-30 09:59:57	Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message	Álvaro Herrera	2026-01-30 09:46:11	Re: Flush some statistics within running transactions