Re: How can end users know the cause of LR slot sync delays?

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: How can end users know the cause of LR slot sync delays?
Date: 2025-11-04 09:47:18
Message-ID: CAJpy0uC9WsJhc-qeFmz_JTPjW1vH3Zm+zS6jX0PKTY6vtEp38w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 3, 2025 at 3:14 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> >
> > I have attached the latest patch.
> >
>
> Thanks. I have started going through it.
>
> I’m not sure if this has already been discussed; I couldn’t find any
> mention of it in the thread. Why don’t we persist
> 'slot_sync_skip_reason' (it is outside of
> ReplicationSlotPersistentData)? If a slot wasn’t synced during the
> last cycle and the server restarts, it would be helpful to know the
> reason it wasn’t synced prior to the node restart.
>

Please find a few more comments:

1)
last_slot_sync_skip

Will 'last_slotsync_skip_at' be a better name?

Since we refer worker as slotsync in docs, I feel slotsync seems a
more natural choice than slot_sync. Also '_at' gives clarity that it
is about time rather than a boolean (which currently it seems like).

Same goes for slot_sync_skip_count and slot_sync_skip_reason. Shall
these be slotsync_skip_count and slotsync_skip_reason.

2)
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason
skip_reason,
+ bool acquire_slot)

It looks like there is only one caller that passes acquire_slot as
true, while all others pass false. Instead of keeping the acquire_slot
parameter, will it be better if we remove it and add an
Assert(MyReplication) to ensure the slot is already acquired? We can
add a comment stating that this function expects the slot to be
acquired by the caller. The one caller that currently passes
acquire_slot = true can acquire the slot explicitly before invoking
this function. Thoughts?

3)
In update_and_persist_local_synced_slot(), we get both
'found_consistent_snapshot' and 'remote_slot_precedes' from
update_local_synced_slot(). But skipsync-reason for
'remote_slot_precedes' is updated inside update_local_synced_slot()
while skipsync-reason for '!found_consistent_snapshot' is updated in
caller update_and_persist_local_synced_slot. Is there a reason for
that?

4)
What about the case where the slot is invalidated and sync is skipped?
I do not see any stats for that. See 'Skip the sync of an invalidated
slot' in synchronize_one_slot(). If it is already discussed and
concluded, please add a comment.

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2025-11-04 09:54:37 Re: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs
Previous Message Chao Li 2025-11-04 09:40:19 Re: [PATCH] Add pretty formatting to pg_get_triggerdef