| From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Yilin Zhang <jiezhilove(at)126(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Subject: | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
| Date: | 2026-02-02 09:10:25 |
| Message-ID: | CAJpy0uDDrgrCNJUCVnH1GD_aWScD7kS+NwV_C2Ve=S7ufikFEA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Feb 2, 2026 at 12:56 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Friday, January 30, 2026 5:49 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> >
> > Thanks for the patch. Few comments:
>
> Thanks for the comments.
>
> >
> > 1)
> > * If the SQL function pg_sync_replication_slots() is used to sync the slots,
> > * and if the slots are not ready to be synced and are marked as
> > RS_TEMPORARY
> > * because of any of the reasons mentioned above, then the SQL function also
> > * waits and retries until the slots are marked as RS_PERSISTENT (which
> > means
> > * sync-ready). Refer to the comments in SyncReplicationSlots() for more
> > * details.
> >
> > We need to update the comment at the top of the file as well.
>
> Updated.
>
> >
> > 2)
> > Is there a reason we don't call should_resync_slot() in
> > synchronize_slots() after each synchronize_one_slot() call? If we did, we could
> > invoke it in a single place instead of calling it at multiple locations as we do
> > now.
>
> Since should_resync_slot() can only be invoked when the slot is acquired, it
> cannot be called in synchronize_slots() where slot has been released. I thought
> of searching for that slot again, but that seems to bring unnecessary cost and
> codes.
Okay, got it.
> >
> > 3)
> > In commit message, we have mentioned pg_sync_replication_slot() at one
> > palce instead of pg_sync_replication_slots().
>
> Updated.
>
> >
> > 4)
> > In the test, can you please elaborate why we need
> > 'synchronized_standby_slots' adjustment twice?
> >
> > # Remove the standby from the synchronized_standby_slots list and reload
> > the # configuration.
> > $primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
> > $primary->reload;
> >
> > # Remove the standby from the synchronized_standby_slots and reduce the
> > maximum # walsender number.
> > $primary->append_conf(
> > 'postgresql.conf', qq(
> > max_wal_senders = 2
> > synchronized_standby_slots = ''
> > ));
> > $primary->restart;
>
> Sorry, it's a copy-paster error, removed one of them.
>
> >
> > 5)
> > At the end of the test, where do we check that the API is no longer continuing
> > and has successfully finished? I see '$h->quit;', but is that equivalent to
> > verifying that the API is successfully over?
>
> The test first confirms that slot_skip_reason is set to NULL,
> indicating successful synchronization, and then confirms that the backend has exited.
> (If the function is blocking, the `$h->quit;` cannot pass.)
>
> Here are the updated patches.
>
Thanks for the patch. Few trivial comments:
1)
+ * and if the slots are not ready to be synced because of any of the reasons
+ * mentioned above, then the SQL function also waits and retries
until the slots
+ * are synchronized to the latest information. Refer to the comments
+ * in SyncReplicationSlots() for more details.
We can make it slightly more clear by mentioning that it waits only
for the slots which existed at the start of function:
"...then the SQL function also waits and retries until the failover
slots that existed on primary at the start of the function call are
synchronized."
2)
We have below comment atop SyncReplicationSlots:
* Repeatedly fetches and updates replication slot information from the
* primary until all slots are at least "sync ready".
We shall change this too, as now we are not only waiting for them to
be sync-ready. Even if they are sync-ready, this function can still
wait in subsequent runs for different reasons.
3)
Existing test in test file:
##################################################
# Test that pg_sync_replication_slots() on the standby skips and retries
# until the slot becomes sync-ready (when the remote slot catches up with
# the locally reserved position).
# Also verify that slotsync skip statistics are correctly updated when the
# slotsync operation is skipped.
##################################################
New one added says:
+##################################################
+# Test that when physical replication lags behind logical replication,
+# pg_sync_replication_slots() on the standby skips and retries until physical
+# replication catches up. Also verify that slotsync skip statistics are
+# correctly updated when the slotsync operation is skipped.
+##################################################
The "need" of this new test case isn't very clear provided we already
have testcase1. Perhaps we could revise the comment to something like:
"Test that even for a sync-ready slot, when physical replication lags
behind logical replication, pg_sync_replication_slots()
on the standby skips........"
thanks
Shveta
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mihail Nikalayeu | 2026-02-02 09:17:42 | Re: Adding REPACK [concurrently] |
| Previous Message | Ashutosh Bapat | 2026-02-02 08:36:07 | Re: Report bytes and transactions actually sent downtream |