| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Yilin Zhang <jiezhilove(at)126(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
| Date: | 2026-01-30 06:49:00 |
| Message-ID: | B594E0C2-8FAA-4444-88D9-5C6E8A305E42@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Jan 30, 2026, at 13:48, Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, December 18, 2025 7:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> On Wed, Dec 17, 2025 at 3:58 PM Zhijie Hou (Fujitsu)
>> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>>>
>>> Here is a small patch to fix it.
>>>
>>
>> Thanks, I've pushed the patch. BTW, looking at the code of slot_sync API code
>> path, I could think of the following improvements.
>>
>> *
>> if (remote_slot->confirmed_lsn > latestFlushPtr)
>> { update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
>>
>> /*
>> * Can get here only if GUC 'synchronized_standby_slots' on the
>> * primary server was not configured correctly.
>> */
>> ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
>> errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>
>> Can we change this ERROR to LOG even for API as now the API also retires to
>> sync the slots during initial sync?
>>
>> * The use of the slot_persistence_pending flag in the internal APIs seems to
>> be the reverse of what it should be. I mean to say that initially it should be
>> true and when we actually persist the slot then we can set it to false.
>>
>> * We can retry to sync all the slots present in the primary at the start of API,
>> not only temporary slots. If we do this then the previous point may not be
>> required. Also, please mention something
>> like: "It retries cyclically until all the failover slots that existed on primary at
>> the start of the function call are synchronized." in the function description [1]
>> as well.
>
> Here is the patch to address these points. The patch improves the function to
> retry for both slots that fail to persist and those persistent slots that have
> skipped subsequent synchronizations.
>
> Best Regards,
> Hou zj
> <v1-0002-Add-a-taptest.patch><v1-0001-Improve-the-retry-logic-in-pg_sync_replication_sl.patch>
Hi Zhijie,
Thanks for the patch. It’s really an improvement. After reviewing it, I have a few small comments.
1 - 0001
```
+/*
+ * Helper function to check if the slotsync was skipped and requires re-sync.
+ */
+static bool
+should_resync_slot(void)
+{
+ ReplicationSlot *slot;
+
+ Assert(MyReplicationSlot);
+
+ slot = MyReplicationSlot;
```
This is a purely a helper function, so I think it doesn’t have to use the global MyReplicationSlot. We can just pass in a slot, so that this function will be more reusable.
2 - 0001
```
+ * *slotsync_pending is set to true if the slot's synchronization is skipped and
+ * requires re-sync. See should_resync_slot() for cases requiring
+ * re-sync.
*
* Returns TRUE if the local slot is updated.
*/
static bool
synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid,
- bool *slot_persistence_pending)
+ bool *slotsync_pending)
```
This function only sets *slotsync_pending to true, so it relies on the callers to initiate it to false. If a caller forgets to initialize it to false, or wrongly set it to true, then when this function, the variable may contain an unexpected value. So I think it’s better to set *slotsync_pending to false in the beginning of this function.
3 - 0001
```
/* Done if all slots are persisted i.e are sync-ready */
- if (!slot_persistence_pending)
+ if (!slotsync_pending)
break;
```
I think this comment becomes stale with this patch and needs an update. Now it’s only done if persisted and should_resync_slot()==false.
4 - 0002
```
+$primary->adjust_conf('postgresql.conf', 'synchronized_standby_slots', "''");
+$primary->reload;
```
This reload seems not needed because the next step immediately restarts primary.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Shinya Kato | 2026-01-30 06:59:42 | Wake up backends immediately when sync standbys decrease |
| Previous Message | Dilip Kumar | 2026-01-30 06:35:59 | Re: Proposal: Conflict log history table for Logical Replication |