Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: Japin Li <japinli(at)hotmail(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-10-31 05:01:51
Message-ID: ME0P300MB0445DF4AB98C1CC779E0138AB6F8A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 30 Oct 2025 at 19:15, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> Hi Ajin,
>
> I have reviewed v20 and got a few comments:
>
>> On Oct 30, 2025, at 18:18, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>>
>> <v20-0001-Improve-initial-slot-synchronization-in-pg_sync_.patch>
>
> 1 - slotsync.c
> ```
> + if (slot_names)
> + list_free_deep(slot_names);
>
> /* Cleanup the synced temporary slots */
> ReplicationSlotCleanup(true);
> @@ -1762,5 +2026,5 @@ SyncReplicationSlots(WalReceiverConn *wrconn)
> /* We are done with sync, so reset sync flag */
> reset_syncing_flag();
> }
> - PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
> + PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&fparams));
> ```
>
> I am afraid there is a risk of double memory free. Slot_names has been assigned to fparams.slot_names within the for loop, and it’s freed after the loop. If something gets wrong and slotsync_failure_callback() is called, the function will free fparams.slot_names again.
>

Agreed.

Maybe we should set the fparams.slot_names to NIL immediately after freeing
the memory.

> 2 - slotsync.c
> ```
> + /*
> + * Fetch remote slot info for the given slot_names. If slot_names is NIL,
> + * fetch all failover-enabled slots. Note that we reuse slot_names from
> + * the first iteration; re-fetching all failover slots each time could
> + * cause an endless loop. Instead of reprocessing only the pending slots
> + * in each iteration, it's better to process all the slots received in
> + * the first iteration. This ensures that by the time we're done, all
> + * slots reflect the latest values.
> + */
> + remote_slots = fetch_remote_slots(wrconn, slot_names);
> +
> + /* Attempt to synchronize slots */
> + some_slot_updated = synchronize_slots(wrconn, remote_slots,
> + &slot_persistence_pending);
> +
> + /*
> + * If slot_persistence_pending is true, extract slot names
> + * for future iterations (only needed if we haven't done it yet)
> + */
> + if (slot_names == NIL && slot_persistence_pending)
> + {
> + slot_names = extract_slot_names(remote_slots);
> +
> + /* Update the failure structure so that it can be freed on error */
> + fparams.slot_names = slot_names;
> + }
> ```
>
> I am thinking if that could be a problem. As you now extract_slot_names() only in the first iteration, if a slot is dropped, and a new slot comes with the same name, will the new slot be incorrectly synced?
>

The slot name alone is insufficient to distinguish between the old and new
slots. In this case, the new slot state will overwrite the old. I see no
harm in this behavior, but please confirm if this is the desired behavior.

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-10-31 05:34:31 Re: Improve pg_sync_replication_slots() to wait for primary to advance
Previous Message Amit Kapila 2025-10-31 04:59:04 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart