From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Ajin Cherian <itsajin(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date: | 2025-08-19 09:42:05 |
Message-ID: | CAJpy0uAbx2zhFnTr9bUOWM_OwGjx8twqP3xShY2TGHqaL0jYSQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 19, 2025 at 10:55 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> Attaching patch v7 addressing all the above comments.
>
Thank You for the patches. Please find a few comments:
1)
We are not resetting 'slot_persistence_pending' to false anywhere. So
once it hits the flow which sets it to true, it will never become
false even if remote-slot catches up in subsequent cycles, resulting
in a hang of the API. We shall reset it before starting a new
iteration in SyncReplicationSlots().
2)
We need to clean 'slot_persistence_pending' in reset_syncing_flag() as
well which is called at the end of API or in failure of API. Even
though the slot sync worker is not using it, we should clean it up in
slotsync_worker_onexit() as well.
3)
+ /* slot has been persisted, no need to retry */
+ SlotSyncCtx->slot_persistence_pending |= false;
+
This will not be needed once we reset this flag before each iteration
in SyncReplicationSlots()
4)
-synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+synchronize_one_slot(WalReceiverConn * wrconn, RemoteSlot * remote_slot,
+ Oid remote_dbid)
wrconn not used anywhere.
5)
+ bool is_refresh = (target_slot_list!= NIL);
is_refresh is not needed. We can simply check if
target_slot_list!=NIL, then append it to cmd.
6)
* If remote_slot_list is NIL, fetches all failover logical slots from the
* primary server. If remote_slot_list is provided, refreshes only those
* specific slots with current values from the primary server.
The usage of the word 'refreshing' is confusing. Since we are
allocating a new remote-list everytime (instead of reusing or
refreshing previous one), we can simply say:
------
Fetches the failover logical slots info from the primary server
If target_slot_list is NIL, fetches all failover logical slots from
the primary server, otherwise fetches only the ones mentioned in
target_slot_list
------
The 'Parameters:' can also be adjusted accordingly.
7)
* Returns a list of RemoteSlot structures. If refreshing and the query fails,
* returns the original list. Slots that no longer exist on the primary will
* be removed from the list.
This can be removed.
8)
- * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the
- * slot is valid, that means we have fetched the remote_slot in its
- * RS_EPHEMERAL state. In such a case, don't sync it; we can always
- * sync it in the next sync cycle when the remote_slot is persisted
- * and has valid lsn(s) and xmin values.
- *
- * XXX: In future, if we plan to expose 'slot->data.persistency' in
- * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL
- * slots in the first place.
+ * Apply ephemeral slot filtering. Skip slots that are in RS_EPHEMERAL
+ * state (invalid LSNs/xmin but not explicitly invalidated).
We can retain the original comment.
9)
Apart from above, there are many changes (alignement, comments etc)
which are not related to this particular improvement. We can get rid
of those changes. The patch should have the changes pertaining to
current improvement alone.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2025-08-19 09:57:47 | Re: Make pgoutput documentation easier to find |
Previous Message | Álvaro Herrera | 2025-08-19 09:31:28 | Re: New commitfest app release on August 19th |