Re: How can end users know the cause of LR slot sync delays?

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: How can end users know the cause of LR slot sync delays?
Date: 2025-09-24 11:12:13
Message-ID: CANhcyEWKk_O_4pmGMf8mRziuNVrXjPR37+L6dHg1GMOGJ3L5Ww@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh,

Thanks for reviewing the patch.

On Mon, 22 Sept 2025 at 10:59, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Hi,
>
> @@ -646,7 +670,11 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
> remote_dbid)
> remote_slot->name,
> LSN_FORMAT_ARGS(latestFlushPtr)));
>
> - return false;
> + /* If slot is present on the local, update the slot sync skip stats */
> + if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
> + skip_reason = SLOT_SYNC_SKIP_STANDBY_BEHIND;
> + else
> + return false;
>
> With this change, you’re likely enforcing sync slot creation, whereas
> earlier that might not have been the case. This introduces a
> behavioral change, which may not be well received.
>
> --
I have fixed it in latest version of patch

>
> I think we can avoid passing skip_reason as a new argument to
> update_local_synced_slot(). It only needs to be passed to
> update_and_persist_local_synced_slot(). When
> update_local_synced_slot() is invoked from within
> update_and_persist_local_synced_slot(), we can simply rely on the two
> flags, remote_slot_precedes and found_consistent_snapshot and set the
> skip_reason accordingly, thoughts?
>
> If update_local_synced_slot is being called from any other place that
> means the slot is already persisted.
>
I came up with a solution on similar lines as above in the attached patch.
And also as per Amit's comment in [1], I have kept the behaviour to
show skip reason even for persistent slots.

> --
>
> +typedef enum SlotSyncSkipReason
> +{
> + SLOT_SYNC_SKIP_NONE, /* No skip */
> + SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
> + SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
> + SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
> + * consistent snapshot */
> +} SlotSyncSkipReason;
> +
>
> I would suggest shortening the enum names like maybe SS_SKIP_NONE
> instead of SLOT_SYNC_SKIP_NONE.
>
Fixed it.

I have attached the latest patch [2].

[1]: https://www.postgresql.org/message-id/CAA4eK1%2B6UsZiN0GnRNue_Vs8007jdcDFetNq%2BapubHcrqzjwpQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CANhcyEVFZN2Mkjs0QHshKm2_3AkQ0eufjkD12eL2MeuVkPyGbw%40mail.gmail.com

Thanks,
Shlok Kyal

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2025-09-24 11:12:44 Re: REPACK and naming
Previous Message Shlok Kyal 2025-09-24 11:10:40 Re: How can end users know the cause of LR slot sync delays?