From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
Subject: | Re: How can end users know the cause of LR slot sync delays? |
Date: | 2025-10-15 10:45:01 |
Message-ID: | CANhcyEXfAcmmfrYde8cTfiAM0EGs6pOqx=Gb8KAiA1PfQUvFjw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 1 Oct 2025 at 08:26, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Shlok,
>
> > Thanks for updating the patch. Here are my comments.
>
> I found one more comment.
>
> ```
> + /*
> + * If found_consistent_snapshot is not NULL and a consistent snapshot is
> + * found set the slot sync skip reason to none. Else, if consistent
> + * snapshot is not found the stats will be updated in the function
> + * update_and_persist_local_synced_slot
> + */
> + if (!found_consistent_snapshot || *found_consistent_snapshot)
> + update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
> ```
>
> I think the condition is confusing; in code level there is a path that
> found_consistent_snapshot is NULL but synchronization happened. (Not sure it is
> possible though).
>
> I think it is better to put update_slot_sync_skip_stats() near the sync part.
> If the snapshot exists from the beginning, it can be done unconditionally,
> otherwise we can check again. Attached .diffs file implements it.
>
I agree that the condition can be confusing. I checked your patch and
have a concern.
For the change:
```
LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn,
found_consistent_snapshot);
+ /* Update the slot sync skip reason if snapshot could be created */
+ if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
+ {
+ Assert(!found_consistent_snapshot ||
+ *found_consistent_snapshot);
+ update_slot_sync_skip_stats(slot, SS_SKIP_NONE);
+ }
+
```
I debugged and noticed that, on a successful sync slot run, in some
cases we do not enter the if condition here which can lead to
misleading slot_sync_skip_reason.
I noticed that even if the synced slot is advanced and
'found_consistent_snapshot' set as true,
SnapBuildSnapshotExists(remote_slot->restart_lsn) can return false.
As far as I understand, LogicalSlotAdvanceAndCheckSnapState will
advance the slot but will not serialize the snapshot and hence
SnapBuildSnapshotExists can return false.
I have also added a test case in 049_slot_skip_stats named
"slot_sync_skip_reason is reset after successful sync" which can
reproduce it.
I think we can update the stats when "slot->data.confirmed_flush ==
remote_slot->confirmed_lsn" instead of checking
'SnapBuildSnapshotExists'. Thoughts?
Thanks,
Shlok Kyal
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-10-15 10:56:24 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Previous Message | Shlok Kyal | 2025-10-15 10:41:33 | Re: How can end users know the cause of LR slot sync delays? |