| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> | 
|---|---|
| To: | Ajin Cherian <itsajin(at)gmail(dot)com> | 
| Cc: | Japin Li <japinli(at)hotmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Improve pg_sync_replication_slots() to wait for primary to advance | 
| Date: | 2025-10-30 11:15:34 | 
| Message-ID: | EA43BF2D-B12F-417E-B3FE-24DB359CC2D6@gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi Ajin,
I have reviewed v20 and got a few comments:
> On Oct 30, 2025, at 18:18, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> 
> <v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch>
1 - slotsync.c
```
+		if (slot_names)
+			list_free_deep(slot_names);
 
 		/* Cleanup the synced temporary slots */
 		ReplicationSlotCleanup(true);
@@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
 		/* We are done with sync, so reset sync flag */
 		reset_syncing_flag();
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+	PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
```
I am afraid there is a risk of double memory free. Slot_names has been assigned to fparams.slot_names within the for loop, and it’s freed after the loop. If something gets wrong and slotsync_failure_callback() is called, the function will free fparams.slot_names again.
2 - slotsync.c
```
+			/*
+			 * Fetch remote slot info for the given slot_names. If slot_names is NIL,
+			 * fetch all failover-enabled slots. Note that we reuse slot_names from
+			 * the first iteration; re-fetching all failover slots each time could
+			 * cause an endless loop. Instead of reprocessing only the pending slots
+			 * in each iteration, it's better to process all the slots received in
+			 * the first iteration. This ensures that by the time we're done, all
+			 * slots reflect the latest values.
+			 */
+			remote_slots = fetch_remote_slots(wrconn, slot_names);
+
+			/* Attempt to synchronize slots */
+			some_slot_updated = synchronize_slots(wrconn, remote_slots,
+												  &slot_persistence_pending);
+
+			/*
+			 * If slot_persistence_pending is true, extract slot names
+			 * for future iterations (only needed if we haven't done it yet)
+			 */
+			if (slot_names == NIL && slot_persistence_pending)
+			{
+				slot_names = extract_slot_names(remote_slots);
+
+				/* Update the failure structure so that it can be freed on error */
+				fparams.slot_names = slot_names;
+			}
```
I am thinking if that could be a problem. As you now extract_slot_names() only in the first iteration, if a slot is dropped, and a new slot comes with the same name, will the new slot be incorrectly synced?
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sadhuprasad Patro | 2025-10-30 11:30:27 | Re: Improved TAP tests by replacing sub-optimal uses of ok() with better Test::More functions | 
| Previous Message | John Naylor | 2025-10-30 11:13:56 | Re: Confine vacuum skip logic to lazy_scan_skip |