From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
Subject: | Re: How can end users know the cause of LR slot sync delays? |
Date: | 2025-09-26 11:35:54 |
Message-ID: | CANhcyEWbY4W3hCDbMbKH4ME-euw60vvg9eLLMMjbG0v2NJGAVg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 24 Sept 2025 at 16:39, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> On Wed, 17 Sept 2025 at 16:24, Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Shlok,
> >
> > Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
> > indicate the current status of synchronization, it just shows the history.
> > I feel approach2 has more information than approach1.
> >
> I agree with your point. I think the preferred behaviour of
> slot_sync_skip_reason is to indicate the current status of
> synchronization. And taking account the current behaviour of columns
> of views pg_replication_slots and pg_stat_replication_slots, I think
> slot_sync_skip_reason is suited for pg_replication_slots view and
> I am proceeding ahead with approach2
>
> > Below are my comments for your patch.
> >
> > 01.
> > ```
> > + Number of times the slot sync is skipped.
> > ```
> >
> > "slot sync" should be "slot synchronization". Same thing can be saied for other
> > attributes.
> >
> > 02.
> > ```
> > + Reason of the last slot sync skip.
> > ```
> >
> > Possible values must be clarified.
> >
> > 03.
> > ```
> > + s.slot_sync_skip_count,
> > + s.last_slot_sync_skip,
> > + s.slot_sync_skip_reason,
> > ```
> >
> > Last line has tab-blank but others have space-blank.
> >
> > 04.
> > ```
> > +typedef enum SlotSyncSkipReason
> > +{
> > + SLOT_SYNC_SKIP_NONE, /* No skip */
> > + SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
> > + SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
> > + SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
> > + * consistent snapshot */
> > +} SlotSyncSkipReason
> > ```
> >
> > a.
> > Can we add comment atop the enum?
> >
> > b.
> > SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has
> > not received WALs required by slots, but enum does not clarify it.
> > How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very
> > welcome.
> >
> > c
> > s/reach/build/.
> >
> > 05.
> > ```
> > @@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> > remote_slot->name,
> > LSN_FORMAT_ARGS(latestFlushPtr)));
> >
> > - return false;
> > + /* If the slot is not present on the local */
> > + if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true)))
> > + return false;
> > }
> > ```
> >
> > Looks like if users try to sync slots via SQL interfaces, the statistics cannot
> > be updated.
> >
> > 06. update_slot_sync_skip_stats
> > ```
> > + /* Update the slot sync reason */
> > + SpinLockAcquire(&slot->mutex);
> > + slot->slot_sync_skip_reason = skip_reason;
> > + SpinLockRelease(&slot->mutex);
> > ```
> >
> > I feel no need to update the reason if the slot->slot_sync_skip_reason
> > and skip_reason are the same.
> >
> > 07. synchronize_one_slot
> > ```
> > + /*
> > + * If standby is behind remote slot and the synced slot is present on
> > + * local.
> > + */
> > + if (remote_slot->confirmed_lsn > latestFlushPtr)
> > + {
> > + if (synced)
> > + update_slot_sync_skip_stats(slot, skip_reason);
> > + return false;
> > + }
> > ```
> >
> > This condition exist in the same function; can we combine?
> >
> > 08. GetSlotSyncSkipReason()
> >
> > Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(),
> > and it does not use.
> >
> I check similar functions and I think we can remove pstrup. Removed in
> the latest patch.
>
> > 09.
> >
> > Can you consider a test for added codes?
> Added test in 0002 patch
>
> I have also addressed remaining comments and attached the latest
> version of patch.
>
The CF Bot was failing as meson.build was not formatted appropriately.
I have fixed it and made some cosmetic changes in the test file.
I ran the CF tests on my local repository and it is passing all tests.
I have attached the updated version.
Thanks,
Shlok Kyal
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Add-stats-related-to-slot-sync-skip.patch | application/octet-stream | 22.0 KB |
v4-0002-Add-test-for-new-stats-for-slot-sync-skip.patch | application/octet-stream | 7.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kirill Reshke | 2025-09-26 11:39:21 | Re: Remove custom redundant full page write description from GIN |
Previous Message | Michael Paquier | 2025-09-26 11:33:02 | Re: [BUG] temporary file usage report with extended protocol and unnamed portals |