| From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
| Subject: | Re: How can end users know the cause of LR slot sync delays? |
| Date: | 2025-11-21 10:12:48 |
| Message-ID: | CANhcyEWm2M+OgrnEZDLt8K91+N12SvTLYq6QMKgbmEWEukV60w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 21 Nov 2025 at 09:58, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Nov 21, 2025 at 8:52 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 18, 2025 at 4:07 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > >
> > > On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
> > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > >
> > > > Dear Shlok,
> > > >
> > > > Thanks for updating the patch. Few more comments.
> > > >
> > > > > > > > I’m not sure if this has already been discussed; I couldn’t find any
> > > > > > > > mention of it in the thread. Why don’t we persist
> > > > > > > > 'slot_sync_skip_reason' (it is outside of
> > > > > > > > ReplicationSlotPersistentData)? If a slot wasn’t synced during the
> > > > > > > > last cycle and the server restarts, it would be helpful to know the
> > > > > > > > reason it wasn’t synced prior to the node restart.
> > > > > > > >
> > > > > Actually I did not think in this direction. I think it will be useful
> > > > > to persist 'slot_sync_skip_reason'. I have made the change for the
> > > > > same in the latest patch.
> > > >
> > > > Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
> > > > are essential to decode or replicate changes correctly, but sync status is not
> > > > used for the purpose. Personally considered, slot sync would re-start soon after
> > > > the reboot so that it is OK to start with empty. How about others?
> > > >
> > > > If we want to serialize the info, we should do further tasks:
> > > > - update SLOT_VERSION
> > > > - make the slot dirty then SaveSlotToPath() when the status is updated.
> > > >
> > > I agree with your point. Slot synchronization will restart shortly
> > > after a reboot, so it seems reasonable to begin with an empty state
> > > rather than persisting slot_sync_skip_reason.
> > > For now, I’ve updated the patch so that slot_sync_skip_reason is no
> > > longer persisted; its initialization is kept outside of
> > > ReplicationSlotPersistentData. I’d also like to hear what others
> > > think.
> > >
> >
> > Users may even use an API to synchronize the slots rather than
> > slotsync worker. In that case synchronization won't start immediately
> > after server-restart.
> >
>
> But I think after restart in most cases, the slot will be created
> fresh as we persist the slot for the first time only when sync is
> successful. Now, when the standby has not flushed the WAL
> corresponding to remote_lsn (SS_SKIP_WAL_NOT_FLUSHED), the slotsync
> can be skipped even for persisted slots but that should be rare and we
> anyway won't be able to persist the slot_skip reason in other cases as
> slot itself won't be persisted by that time. So, I feel keeping the
> slot_sync_skip_reason in memory is sufficient.
>
I agree.
I have added a comment why we decided to keep slotsync_skip_reason in_memory
Attached the latest patch in [1].
[1]: https://www.postgresql.org/message-id/CANhcyEUiY4ENuoi7kZSsLJFLn6yA_-oPCKrek%3DBaMfFfY3%3DP1w%40mail.gmail.com
Thanks
Shlok Kyal
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Shlok Kyal | 2025-11-21 10:13:00 | Re: How can end users know the cause of LR slot sync delays? |
| Previous Message | Shlok Kyal | 2025-11-21 10:11:44 | Re: How can end users know the cause of LR slot sync delays? |