Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-08-20 05:22:58
Message-ID: CAFPTHDZ=pm2fVccK7XxvPm3kHX3CH-mVdnCwLdqOSDtpeS2rqA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 19, 2025 at 7:42 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Tue, Aug 19, 2025 at 10:55 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > Attaching patch v7 addressing all the above comments.
> >
>
> Thank You for the patches. Please find a few comments:
>
> 1)
> We are not resetting 'slot_persistence_pending' to false anywhere. So
> once it hits the flow which sets it to true, it will never become
> false even if remote-slot catches up in subsequent cycles, resulting
> in a hang of the API. We shall reset it before starting a new
> iteration in SyncReplicationSlots().
>
> 2)
> We need to clean 'slot_persistence_pending' in reset_syncing_flag() as
> well which is called at the end of API or in failure of API. Even
> though the slot sync worker is not using it, we should clean it up in
> slotsync_worker_onexit() as well.
>

Done.

> 3)
> + /* slot has been persisted, no need to retry */
> + SlotSyncCtx->slot_persistence_pending |= false;
> +
>
> This will not be needed once we reset this flag before each iteration
> in SyncReplicationSlots()
>

Removed.

> 4)
> -synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> +synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot,
> + Oid remote_dbid)
>
> wrconn not used anywhere.
>

Removed.

> 5)
> + bool is_refresh = (target_slot_list!= NIL);
>
> is_refresh is not needed. We can simply check if
> target_slot_list!=NIL, then append it to cmd.
>

Changed.

> 6)
> * If remote_slot_list is NIL, fetches all failover logical slots from the
> * primary server. If remote_slot_list is provided, refreshes only those
> * specific slots with current values from the primary server.
>
> The usage of the word 'refreshing' is confusing. Since we are
> allocating a new remote-list everytime (instead of reusing or
> refreshing previous one), we can simply say:
>
> ------
> Fetches the failover logical slots info from the primary server
>
> If target_slot_list is NIL, fetches all failover logical slots from
> the primary server, otherwise fetches only the ones mentioned in
> target_slot_list
> ------
>
> The 'Parameters:' can also be adjusted accordingly.
>

Done.

>
> 7)
> * Returns a list of RemoteSlot structures. If refreshing and the query fails,
> * returns the original list. Slots that no longer exist on the primary will
> * be removed from the list.
>
> This can be removed.
>

Done.

>
> 8)
> - * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the
> - * slot is valid, that means we have fetched the remote_slot in its
> - * RS_EPHEMERAL state. In such a case, don't sync it; we can always
> - * sync it in the next sync cycle when the remote_slot is persisted
> - * and has valid lsn(s) and xmin values.
> - *
> - * XXX: In future, if we plan to expose 'slot->data.persistency' in
> - * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL
> - * slots in the first place.
> + * Apply ephemeral slot filtering. Skip slots that are in RS_EPHEMERAL
> + * state (invalid LSNs/xmin but not explicitly invalidated).
>
> We can retain the original comment.
>

Done.

> 9)
> Apart from above, there are many changes (alignement, comments etc)
> which are not related to this particular improvement. We can get rid
> of those changes. The patch should have the changes pertaining to
> current improvement alone.
>

I've removed them.
Attaching patch v8 addressing the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachment Content-Type Size
v8-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patch application/octet-stream 18.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2025-08-20 05:24:23 Re: Improve pg_sync_replication_slots() to wait for primary to advance
Previous Message Kirill Reshke 2025-08-20 05:11:15 Re: Test instability when pg_dump orders by OID