Re: How can end users know the cause of LR slot sync delays?

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Subject: Re: How can end users know the cause of LR slot sync delays?
Date: 2025-09-24 11:09:33
Message-ID: CANhcyEVFZN2Mkjs0QHshKm2_3AkQ0eufjkD12eL2MeuVkPyGbw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 17 Sept 2025 at 16:24, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Shlok,
>
> Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
> indicate the current status of synchronization, it just shows the history.
> I feel approach2 has more information than approach1.
>
I agree with your point. I think the preferred behaviour of
slot_sync_skip_reason is to indicate the current status of
synchronization. And taking account the current behaviour of columns
of views pg_replication_slots and pg_stat_replication_slots, I think
slot_sync_skip_reason is suited for pg_replication_slots view and
I am proceeding ahead with approach2

> Below are my comments for your patch.
>
> 01.
> ```
> + Number of times the slot sync is skipped.
> ```
>
> "slot sync" should be "slot synchronization". Same thing can be saied for other
> attributes.
>
> 02.
> ```
> + Reason of the last slot sync skip.
> ```
>
> Possible values must be clarified.
>
> 03.
> ```
> + s.slot_sync_skip_count,
> + s.last_slot_sync_skip,
> + s.slot_sync_skip_reason,
> ```
>
> Last line has tab-blank but others have space-blank.
>
> 04.
> ```
> +typedef enum SlotSyncSkipReason
> +{
> + SLOT_SYNC_SKIP_NONE, /* No skip */
> + SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
> + SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
> + SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
> + * consistent snapshot */
> +} SlotSyncSkipReason
> ```
>
> a.
> Can we add comment atop the enum?
>
> b.
> SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has
> not received WALs required by slots, but enum does not clarify it.
> How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very
> welcome.
>
> c
> s/reach/build/.
>
> 05.
> ```
> @@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> remote_slot->name,
> LSN_FORMAT_ARGS(latestFlushPtr)));
>
> - return false;
> + /* If the slot is not present on the local */
> + if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true)))
> + return false;
> }
> ```
>
> Looks like if users try to sync slots via SQL interfaces, the statistics cannot
> be updated.
>
> 06. update_slot_sync_skip_stats
> ```
> + /* Update the slot sync reason */
> + SpinLockAcquire(&slot->mutex);
> + slot->slot_sync_skip_reason = skip_reason;
> + SpinLockRelease(&slot->mutex);
> ```
>
> I feel no need to update the reason if the slot->slot_sync_skip_reason
> and skip_reason are the same.
>
> 07. synchronize_one_slot
> ```
> + /*
> + * If standby is behind remote slot and the synced slot is present on
> + * local.
> + */
> + if (remote_slot->confirmed_lsn > latestFlushPtr)
> + {
> + if (synced)
> + update_slot_sync_skip_stats(slot, skip_reason);
> + return false;
> + }
> ```
>
> This condition exist in the same function; can we combine?
>
> 08. GetSlotSyncSkipReason()
>
> Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(),
> and it does not use.
>
I check similar functions and I think we can remove pstrup. Removed in
the latest patch.

> 09.
>
> Can you consider a test for added codes?
Added test in 0002 patch

I have also addressed remaining comments and attached the latest
version of patch.

Thanks,
Shlok Kyal

Attachment Content-Type Size
v3-0001-Add-stats-related-to-slot-sync-skip.patch application/octet-stream 22.0 KB
v3-0002-Add-test-for-new-stats-for-slot-sync-skip.patch application/octet-stream 6.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2025-09-24 11:10:40 Re: How can end users know the cause of LR slot sync delays?
Previous Message Michael Banck 2025-09-24 10:45:59 Re: GNU/Hurd portability patches