| From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
|---|---|
| To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
| Subject: | Re: How can end users know the cause of LR slot sync delays? |
| Date: | 2025-11-18 10:36:53 |
| Message-ID: | CANhcyEUcz5LDLM0hvEEjnrNzkLPvnALgSTtbFrYBiJREQ8=nZQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Shlok,
>
> Thanks for updating the patch. Few more comments.
>
> > > > > I’m not sure if this has already been discussed; I couldn’t find any
> > > > > mention of it in the thread. Why don’t we persist
> > > > > 'slot_sync_skip_reason' (it is outside of
> > > > > ReplicationSlotPersistentData)? If a slot wasn’t synced during the
> > > > > last cycle and the server restarts, it would be helpful to know the
> > > > > reason it wasn’t synced prior to the node restart.
> > > > >
> > Actually I did not think in this direction. I think it will be useful
> > to persist 'slot_sync_skip_reason'. I have made the change for the
> > same in the latest patch.
>
> Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
> are essential to decode or replicate changes correctly, but sync status is not
> used for the purpose. Personally considered, slot sync would re-start soon after
> the reboot so that it is OK to start with empty. How about others?
>
> If we want to serialize the info, we should do further tasks:
> - update SLOT_VERSION
> - make the slot dirty then SaveSlotToPath() when the status is updated.
>
I agree with your point. Slot synchronization will restart shortly
after a reboot, so it seems reasonable to begin with an empty state
rather than persisting slot_sync_skip_reason.
For now, I’ve updated the patch so that slot_sync_skip_reason is no
longer persisted; its initialization is kept outside of
ReplicationSlotPersistentData. I’d also like to hear what others
think.
> ```
> +static void
> +update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason)
> +{
> + Assert(MyReplicationSlot);
> ```
>
> I think no need to require *slot as an argument. We can use the variable to shorten
> like update_local_synced_slot().
>
Fixed
> ```
> # Verify pg_sync_replication_slots is failing
> ok( $stderr =~ /skipping slot synchronization because the received slot sync/,
> 'pg_sync_replication_slots failed as expected');
> ```
>
> This may be matter of taste, but can you check whole of log message? Latter part
> indicates the actual reason.
>
The latter part of the message contains LSN values, which are not
stable across runs. To avoid hard-coding specific LSNs, I matched the
fixed, non-variable parts of the message while still covering the
reason for the failure.
> ```
> # Detach injection point
> $standby->safe_psql(
> 'postgres', q{
> SELECT injection_points_detach('slot-sync-skip');
> SELECT injection_points_wakeup('slot-sync-skip');
> });
> ```
>
> Not mandatory, but you can quit the background session if you release the
> injection point.
Fixed
Apart from the above changes. I have renamed the functions to
consistently use the term 'slotsync' instead of 'slot_sync'
update_slot_sync_skip_stats -> update_slotsync_skip_stats
pgstat_report_replslot_sync_skip -> pgstat_report_replslotsync_skip
I have attached the updated v8 patch with the latest changes.
Thanks,
Shlok Kyal
| Attachment | Content-Type | Size |
|---|---|---|
| v8-0001-Add-stats-related-to-slot-sync-skip.patch | application/octet-stream | 25.5 KB |
| v8-0002-Add-test-for-new-stats-for-slot-sync-skip.patch | application/octet-stream | 7.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2025-11-18 10:44:12 | Re: Report bytes and transactions actually sent downtream |
| Previous Message | Ashutosh Bapat | 2025-11-18 10:35:42 | Re: Report bytes and transactions actually sent downtream |