Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-08-19 09:42:05
Message-ID: CAJpy0uAbx2zhFnTr9bUOWM_OwGjx8twqP3xShY2TGHqaL0jYSQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 19, 2025 at 10:55 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> Attaching patch v7 addressing all the above comments.
>

Thank You for the patches. Please find a few comments:

1)
We are not resetting 'slot_persistence_pending' to false anywhere. So
once it hits the flow which sets it to true, it will never become
false even if remote-slot catches up in subsequent cycles, resulting
in a hang of the API. We shall reset it before starting a new
iteration in SyncReplicationSlots().

2)
We need to clean 'slot_persistence_pending' in reset_syncing_flag() as
well which is called at the end of API or in failure of API. Even
though the slot sync worker is not using it, we should clean it up in
slotsync_worker_onexit() as well.

3)
+ /* slot has been persisted, no need to retry */
+ SlotSyncCtx->slot_persistence_pending |= false;
+

This will not be needed once we reset this flag before each iteration
in SyncReplicationSlots()

4)
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot,
+ Oid remote_dbid)

wrconn not used anywhere.

5)
+ bool is_refresh = (target_slot_list!= NIL);

is_refresh is not needed. We can simply check if
target_slot_list!=NIL, then append it to cmd.

6)
* If remote_slot_list is NIL, fetches all failover logical slots from the
* primary server. If remote_slot_list is provided, refreshes only those
* specific slots with current values from the primary server.

The usage of the word 'refreshing' is confusing. Since we are
allocating a new remote-list everytime (instead of reusing or
refreshing previous one), we can simply say:

------
Fetches the failover logical slots info from the primary server

If target_slot_list is NIL, fetches all failover logical slots from
the primary server, otherwise fetches only the ones mentioned in
target_slot_list
------

The 'Parameters:' can also be adjusted accordingly.

7)
* Returns a list of RemoteSlot structures. If refreshing and the query fails,
* returns the original list. Slots that no longer exist on the primary will
* be removed from the list.

This can be removed.

8)
- * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the
- * slot is valid, that means we have fetched the remote_slot in its
- * RS_EPHEMERAL state. In such a case, don't sync it; we can always
- * sync it in the next sync cycle when the remote_slot is persisted
- * and has valid lsn(s) and xmin values.
- *
- * XXX: In future, if we plan to expose 'slot->data.persistency' in
- * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL
- * slots in the first place.
+ * Apply ephemeral slot filtering. Skip slots that are in RS_EPHEMERAL
+ * state (invalid LSNs/xmin but not explicitly invalidated).

We can retain the original comment.

9)
Apart from above, there are many changes (alignement, comments etc)
which are not related to this particular improvement. We can get rid
of those changes. The patch should have the changes pertaining to
current improvement alone.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-08-19 09:57:47 Re: Make pgoutput documentation easier to find
Previous Message Álvaro Herrera 2025-08-19 09:31:28 Re: New commitfest app release on August 19th