Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-08-14 06:43:59
Message-ID: CAJpy0uAB0NxV8MuoKOJr09HJ-iMyRO41VXyf3bPeY74SPVBdYw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 14, 2025 at 7:28 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
>
> Patch v6 attached.
>

Thanks Ajin. Please find my comments:

1)
SyncReplicationSlots:
+ remote_slots = fetch_or_refresh_remote_slots(wrconn, NIL);
+
+ /* Retry until all slots are sync ready atleast */
+ for (;;)
+ {
+ bool some_slot_updated = false;
+
+ /*
+ * Refresh the remote slot data. We keep using the original slot
+ * list, even if some slots are already sync ready, so that all
+ * slots are updated with the latest status from the primary.
+ */
+ remote_slots = fetch_or_refresh_remote_slots(wrconn, remote_slots);

When the API begins, it seems we are fetching remote_list twice
before we even sync it once. We can get rid of
'fetch_or_refresh_remote_slots' from outside the loop and retain the
inside one. At first call, remote_slots will be NIL and thus it will
fetch all slots and in subsequent calls, it will be populated one.

2)
SyncReplicationSlots:
+ /*
+ * The syscache access in fetch_or_refresh_remote_slots() needs a
+ * transaction env.
+ */
+ if (!IsTransactionState()) {
+ StartTransactionCommand();
+ started_tx = true;
+ }

+ if (started_tx)
+ CommitTransactionCommand();

Shall we move these 2 inside fetch_or_refresh_remote_slots() (both
worker and APi flow) similar to how validate_remote_info() also has it
inside?

3)
SyncReplicationSlots:
+ /* Done if all slots are atleast sync ready */
+ if (!SlotSyncCtx->slot_not_persisted)
+ break;
+ else
+ {
+ /* wait for 2 seconds before retrying */
+ wait_for_slot_activity(some_slot_updated, true);

No need to have 'else' block here. The code can be put without having
'else' because 'if' when true, breaks from the loop.

4)
'fetch_or_refresh_remote_slots' can be renamed to 'fetch_remote_slots'
simply and then a comment can define an extra argument. Because
ultimately we are re-fetching some/all slots in both cases.

5)
In the case of API, wait_for_slot_activity() does not change its wait
time based on 'some_slot_updated'. I think we can pull 'WaitLatch,
ResetLatch' in API-function itself and lets not change worker's
wait_for_slot_activity().

6)
fetch_or_refresh_remote_slots:
+ {
+ if (is_refresh)
+ {
+ ereport(WARNING,
+ errmsg("could not fetch updated failover logical slots info"
+ " from the primary server: %s",
+ res->err));
+ pfree(query.data);
+ return remote_slot_list; /* Return original list on refresh failure */
+ }
+ else
+ {
+ ereport(ERROR,
+ errmsg("could not fetch failover logical slots info from the primary
server: %s",
+ res->err));
+ }
+ }

I think there is no need for different behaviour here for worker and
API. Since worker errors-out here, we can make API also error-out.

7)
+fetch_or_refresh_remote_slots(WalReceiverConn *wrconn, List *remote_slot_list)

We can name the argument as 'target_slot_list' and replace the name
'updated_slot_list' with 'remote_slot_list'.

8)
+ /* If refreshing, free the original list structures */
+ if (is_refresh)
+ {
+ foreach(lc, remote_slot_list)
+ {
+ RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
+ pfree(old_slot);
+ }
+ list_free(remote_slot_list);
+ }

We can get rid of 'is_refresh' and can simply check if
'target_slot_list != NIL', free it. We can use list_free_deep instead
of freeing each element. Having said that, it looks slightly odd to
free the list in this function, I will think more here. Meanwhile, we
can do this.

9)
-update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
+update_and_persist_local_synced_slot(WalReceiverConn * wrconn,
+ RemoteSlot * remote_slot, Oid remote_dbid)

We can get rid of wrconn as we are not using it. Same with wrconn
argument for synchronize_one_slot()

10)
+ /* used by pg_sync_replication_slots() API only */
+ bool slot_not_persisted;

We can move comment outside structure. We can first define it and then
say the above line.

11)
+ SlotSyncCtx->slot_not_persisted = false;

This may overwrite the 'slot_not_persisted' set for the previous slot
and ultimately make it 'false' at the end of cycle even though we had
few not-persisted slots in the beginning of cycle. Should it be:

SlotSyncCtx->slot_not_persisted |= false;

12)
Shall we rename this to : slot_persistence_pending (based on many
other modules using similar names: detach_pending, send_pending,
callback_pending)?

13)
- errmsg("could not synchronize replication slot \"%s\"",
- remote_slot->name),
- errdetail("Synchronization could lead to data loss, because the
remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
standby has LSN %X/%08X and catalog xmin %u.",
- LSN_FORMAT_ARGS(remote_slot->restart_lsn),
- remote_slot->catalog_xmin,
- LSN_FORMAT_ARGS(slot->data.restart_lsn),
- slot->data.catalog_xmin));
+ errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
+ remote_slot->name),
+ errdetail("Attempting Synchronization could lead to data loss,
because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
but the standby has LSN %X/%08X and catalog xmin %u.",
+ LSN_FORMAT_ARGS(remote_slot->restart_lsn),
+ remote_slot->catalog_xmin,
+ LSN_FORMAT_ARGS(slot->data.restart_lsn),
+ slot->data.catalog_xmin));

We can retain the same message as it was put after a lot of
discussion. We can attempt to change if others comment. The idea is
since a worker dumps it in each subsequent cycle (if such a situation
arises), on the same basis now the API can also do so because it is
also performing multiple cycles now. Earlier I had suggested changing
it for API based on messages 'continuing to wait..' which are no
longer there now.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2025-08-14 07:14:54 Re: Support tid range scan in parallel?
Previous Message 赵宇鹏 (宇彭) 2025-08-14 06:43:34 memory leak in logical WAL sender with pgoutput's cachectx