Quick Links

Re: Improve pg_sync_replication_slots() to wait for primary to advance

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date:	2025-08-19 05:24:57
Message-ID:	CAFPTHDYBZytxLaOsU1FyhgNC_BkfXPUQPihNxUB2dYp-GwTEFA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Aug 14, 2025 at 4:44 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Aug 14, 2025 at 7:28 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> >
> > Patch v6 attached.
> >
>
> Thanks Ajin. Please find my comments:
>
> 1)
> SyncReplicationSlots:
> + remote_slots = fetch_or_refresh_remote_slots(wrconn, NIL);
> +
> + /* Retry until all slots are sync ready atleast */
> + for (;;)
> + {
> + bool some_slot_updated = false;
> +
> + /*
> + * Refresh the remote slot data. We keep using the original slot
> + * list, even if some slots are already sync ready, so that all
> + * slots are updated with the latest status from the primary.
> + */
> + remote_slots = fetch_or_refresh_remote_slots(wrconn, remote_slots);
>
> When the API begins, it seems we are fetching remote_list twice
> before we even sync it once. We can get rid of
> 'fetch_or_refresh_remote_slots' from outside the loop and retain the
> inside one. At first call, remote_slots will be NIL and thus it will
> fetch all slots and in subsequent calls, it will be populated one.
>

Fixed.

>
> 2)
> SyncReplicationSlots:
> + /*
> + * The syscache access in fetch_or_refresh_remote_slots() needs a
> + * transaction env.
> + */
> + if (!IsTransactionState()) {
> + StartTransactionCommand();
> + started_tx = true;
> + }
>
> + if (started_tx)
> + CommitTransactionCommand();
>
> Shall we move these 2 inside fetch_or_refresh_remote_slots() (both
> worker and APi flow) similar to how validate_remote_info() also has it
> inside?
>

I tried this but it doesn't work because when the transaction is
committed, the memory allocation for the remote slots are also freed.
So, this needs to be on the outside.

> 3)
> SyncReplicationSlots:
> + /* Done if all slots are atleast sync ready */
> + if (!SlotSyncCtx->slot_not_persisted)
> + break;
> + else
> + {
> + /* wait for 2 seconds before retrying */
> + wait_for_slot_activity(some_slot_updated, true);
>
> No need to have 'else' block here. The code can be put without having
> 'else' because 'if' when true, breaks from the loop.
>

Fixed.

>
> 4)
> 'fetch_or_refresh_remote_slots' can be renamed to 'fetch_remote_slots'
> simply and then a comment can define an extra argument. Because
> ultimately we are re-fetching some/all slots in both cases.
>

Done.

> 5)
> In the case of API, wait_for_slot_activity() does not change its wait
> time based on 'some_slot_updated'. I think we can pull 'WaitLatch,
> ResetLatch' in API-function itself and lets not change worker's
> wait_for_slot_activity().
>

Done.

> 6)
> fetch_or_refresh_remote_slots:
> + {
> + if (is_refresh)
> + {
> + ereport(WARNING,
> + errmsg("could not fetch updated failover logical slots info"
> + " from the primary server: %s",
> + res->err));
> + pfree(query.data);
> + return remote_slot_list; /* Return original list on refresh failure */
> + }
> + else
> + {
> + ereport(ERROR,
> + errmsg("could not fetch failover logical slots info from the primary
> server: %s",
> + res->err));
> + }
> + }
>
> I think there is no need for different behaviour here for worker and
> API. Since worker errors-out here, we can make API also error-out.
>

Fixed.

> 7)
> +fetch_or_refresh_remote_slots(WalReceiverConn *wrconn, List *remote_slot_list)
>
> We can name the argument as 'target_slot_list' and replace the name
> 'updated_slot_list' with 'remote_slot_list'.
>

Fixed.

>
> 8)
> + /* If refreshing, free the original list structures */
> + if (is_refresh)
> + {
> + foreach(lc, remote_slot_list)
> + {
> + RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
> + pfree(old_slot);
> + }
> + list_free(remote_slot_list);
> + }
>
> We can get rid of 'is_refresh' and can simply check if
> 'target_slot_list != NIL', free it. We can use list_free_deep instead
> of freeing each element. Having said that, it looks slightly odd to
> free the list in this function, I will think more here. Meanwhile, we
> can do this.
>

Fixed.

>
> 9)
> -update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> +update_and_persist_local_synced_slot(WalReceiverConn * wrconn,
> + RemoteSlot * remote_slot, Oid remote_dbid)
>
> We can get rid of wrconn as we are not using it. Same with wrconn
> argument for synchronize_one_slot()
>

Done.

> 10)
> + /* used by pg_sync_replication_slots() API only */
> + bool slot_not_persisted;
>
> We can move comment outside structure. We can first define it and then
> say the above line.
>

Done.

>
> 11)
> + SlotSyncCtx->slot_not_persisted = false;
>
> This may overwrite the 'slot_not_persisted' set for the previous slot
> and ultimately make it 'false' at the end of cycle even though we had
> few not-persisted slots in the beginning of cycle. Should it be:
>
> SlotSyncCtx->slot_not_persisted |= false;
>

Fixed.

> 12)
> Shall we rename this to : slot_persistence_pending (based on many
> other modules using similar names: detach_pending, send_pending,
> callback_pending)?
>

Done.

> 13)
> - errmsg("could not synchronize replication slot \"%s\"",
> - remote_slot->name),
> - errdetail("Synchronization could lead to data loss, because the
> remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
> standby has LSN %X/%08X and catalog xmin %u.",
> - LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> - remote_slot->catalog_xmin,
> - LSN_FORMAT_ARGS(slot->data.restart_lsn),
> - slot->data.catalog_xmin));
> + errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
> + remote_slot->name),
> + errdetail("Attempting Synchronization could lead to data loss,
> because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
> but the standby has LSN %X/%08X and catalog xmin %u.",
> + LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> + remote_slot->catalog_xmin,
> + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> + slot->data.catalog_xmin));
>
> We can retain the same message as it was put after a lot of
> discussion. We can attempt to change if others comment. The idea is
> since a worker dumps it in each subsequent cycle (if such a situation
> arises), on the same basis now the API can also do so because it is
> also performing multiple cycles now. Earlier I had suggested changing
> it for API based on messages 'continuing to wait..' which are no
> longer there now.
>

Done.

On Thu, Aug 14, 2025 at 10:44 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Thu, Aug 14, 2025 at 12:14 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> >
> > 8)
> > + /* If refreshing, free the original list structures */
> > + if (is_refresh)
> > + {
> > + foreach(lc, remote_slot_list)
> > + {
> > + RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
> > + pfree(old_slot);
> > + }
> > + list_free(remote_slot_list);
> > + }
> >
> > We can get rid of 'is_refresh' and can simply check if
> > 'target_slot_list != NIL', free it. We can use list_free_deep instead
> > of freeing each element. Having said that, it looks slightly odd to
> > free the list in this function, I will think more here. Meanwhile, we
> > can do this.
> >
>
> +1. The function prologue doesn't mention that the original list is
> deep freed. So a caller may try to access it after this call, which
> will lead to a crash. As a safe programming practice we should let the
> caller free the original list if it is not needed anymore OR modify
> the input list in-place and return it for the convenience of the
> caller like all list_* interfaces. At least we should document this
> behavior in the function prologue. You could also use foreach_ptr
> instead of foreach.
>

I've changed the logic so that it is the responsibility of the caller
to free the list.

> > 13)
> > - errmsg("could not synchronize replication slot \"%s\"",
> > - remote_slot->name),
> > - errdetail("Synchronization could lead to data loss, because the
> > remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
> > standby has LSN %X/%08X and catalog xmin %u.",
> > - LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> > - remote_slot->catalog_xmin,
> > - LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > - slot->data.catalog_xmin));
> > + errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
> > + remote_slot->name),
> > + errdetail("Attempting Synchronization could lead to data loss,
> > because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
> > but the standby has LSN %X/%08X and catalog xmin %u.",
> > + LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> > + remote_slot->catalog_xmin,
> > + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > + slot->data.catalog_xmin));
> >
> > We can retain the same message as it was put after a lot of
> > discussion. We can attempt to change if others comment. The idea is
> > since a worker dumps it in each subsequent cycle (if such a situation
> > arises), on the same basis now the API can also do so because it is
> > also performing multiple cycles now. Earlier I had suggested changing
> > it for API based on messages 'continuing to wait..' which are no
> > longer there now.
>
> Also we usually don't use capital letters at the start of the error
> message. Any reason this is different?
>

Retained the old message.

> Some more
>
> + * When called from pg_sync_replication_slots, use a fixed 2
> + * second wait time.
>
> Function prologue doesn't mention this. Probably the prologue should
> contain only the first sentence there. Rest of the prologues just
> repeat comments in the function. The function is small enough that a
> reader could read the details from the function instead of the
> prologue.
>
> + wait_time = SLOTSYNC_API_NAPTIME_MS;
> + } else {
>
> } else and { should be on separate lines.
>

I've removed the changes in this function and it is now the same as before.

Attaching patch v7 addressing all the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v7-0001-Improve-initial-slot-synchronization-in-pg_sync_r.patch	application/octet-stream	25.0 KB

In response to

Re: Improve pg_sync_replication_slots() to wait for primary to advance at 2025-08-14 06:43:59 from shveta malik

Responses

Re: Improve pg_sync_replication_slots() to wait for primary to advance at 2025-08-19 09:42:05 from shveta malik
Re: Improve pg_sync_replication_slots() to wait for primary to advance at 2025-08-19 10:57:29 from Kirill Reshke

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2025-08-19 05:31:36	Re: VM corruption on standby
Previous Message	Julien Rouhaud	2025-08-19 05:15:37	Re: Plan caching and serialization for reuse across executions