Re: How can end users know the cause of LR slot sync delays?

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Subject: Re: How can end users know the cause of LR slot sync delays?
Date: 2025-11-26 06:28:41
Message-ID: CANhcyEVBPM8GKaHQgkJ2rrJtp8+cnXbsBRaDtxpoHjXOfg6nww@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 26 Nov 2025 at 10:25, shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Nov 26, 2025 at 9:30 AM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Wed, 26 Nov 2025 at 09:00, Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > Dear Hou,
> > >
> > > > I think we did not sync slots to standby2, so I removed the checks for that.
> > > >
> > > > I also adjusted the test in a way that it cleans up existing slots before starting
> > > > new tests.
> > >
> > > Thanks for updating the patch. I confirmed on my env that your patch could be
> > > applied cleanly and tests were passed. Pgperltidy say nothing for your patch.
> > > Also, I preferred the current style.
> > >
> > > I think it is worth checking on BF to see how they say.
> > >
> >
> > Thanks Amit for pushing the 0001 patch.
> > Thanks Hou-san and Kuroda-san on fixing the test.
> >
> > I have rebased the 0002 patch on the current HEAD.
> >
>
> Thanks. Please find a few comments:
>
> 1)
>
> + The reason for the last slot synchronization skip. This field
> is set only
> + for logical slots that are being synchronized from a primary
> server (that
> + is, those whose <structfield>synced</structfield> field is
> + <literal>true</literal>). The value of this column has no meaning on the
> + primary server; it defaults to <literal>none</literal> for all
> slots, but
> + may (if leftover from a promoted standby) also have a value other than
> + <literal>none</literal>. Possible values are:
>
> We can make this similar to existing fields (slotsync_skip_count and
> slotsync_skip_at):
>
> Slot synchronization occurs only on standby servers and thus this column has
> no meaning on the primary server.
>
> 2)
> Doc on possible values of slotsync_skip_reason can be improved. Example
>
> + <literal>remote_behind</literal> means that the last slot
> + synchronization was skipped because the slot is ahead of the
> + corresponding failover slot on the primary.
>
> We can get rid of 'last slot synchronization was skipped' from all the
> reasons. (See 'invalidation_reason' possible values for reference, it
> does not mention 'was invalidated' in any).
>
> 3)
> + <literal>wal_not_flushed</literal> means that the last slot
> + synchronization was skipped because the standby had not flushed the
> + WAL corresponding to the confirmed flush position on the remote slot.
>
> I am not sure if we need to mention 'confirmed flush position'. Shall we say:
>
> '.....because the standby had not flushed the WAL corresponding to the
> position reserved on the remote slot'.
> Thoughts?
>
I think the suggested wording would be more clear to understand for
the user. Added the change.

> 4)
> + <literal>none</literal> means that the last slot synchronization
> + completed successfully.
>
> Do we even need to mention 'none' in doc? 'invalidation_reason' does
> not mention it.
>
> 5)
> postgres=# select slot_name, invalidation_reason, slotsync_skip_reason
> from pg_replication_slots;
> slot_name | invalidation_reason | slotsync_skip_reason
> ----------------+---------------------+----------------------
> failover_slot2 | | none
>
> Shall we keep 'slotsync_skip_reason' as NULL instead of 'none' similar
> to invalidation_reason. Thoughts?
>
I agree with your suggestion. I have removed the 'none' value and used
NULL instead.

> 6)
> If we plan to keep slotsync_skip_reason as NULL instead of 'none' for
> non-skipped cases (above comment), then below code can be optimised as
> we do not need to update 'none' as stats.
> 'skip_reason' and last update() call can then be removed and we can
> simply call update_slotsync_skip_stats() instead of 'skip_reason =
> SS_SKIP_NO_CONSISTENT_SNAPSHOT'.
>
> update_local_synced_slot():
>
> + SlotSyncSkipReason skip_reason = SS_SKIP_NONE;
> + update_slotsync_skip_stats(SS_SKIP_REMOTE_BEHIND);
> + skip_reason = SS_SKIP_NO_CONSISTENT_SNAPSHOT;
> + /* Update slot sync skip stats */
> + update_slotsync_skip_stats(skip_reason);
>
I think we need this change even if we use NULL instead of 'none'.
This change ensures that the slot sync reason is set to NULL if slot
sync is successful.

> 7)
>
> +static char *
> +GetSlotSyncSkipReason(SlotSyncSkipReason reason)
>
> We are passing 'SlotSyncSkipReason' and function name says
> 'GetSlotSyncSkipReason', looks confusing.
> Shall we rename function name to GetSlotSyncSkipString or
> GetSlotSyncSkipReasonName (similar to GetSlotInvalidationCauseName)
>
> 8)
>
> + * A slotsync skip typically occurs only for temporary slots. For
> + * persistent slots it is extremely rare (e.g., cases like
> + * SS_SKIP_WAL_NOT_FLUSHED or SS_SKIP_REMOTE_BEHIND). Also, temporary
> + * slots are dropped after server restart, so there is no value in
> + * persisting the slotsync_skip_reason.
> + */
> + SlotSyncSkipReason slotsync_skip_reason;
>
> I feel, 'Also-->Since' will make more sense here.
>
>
> 9)
> In doc for slotsync_skip_count and slotsync_skip_at
>
> + Slot
> + synchronization occur only on standby servers and thus this column has
> + no meaning on the primary server.
>
> occur --> occurs

I have also addressed the remaining comments and attached the updated patch.

Thanks,
Shlok Kyal

Attachment Content-Type Size
v13-0001-Add-slotsync_skip_reason-to-pg_replication_slots.patch application/octet-stream 15.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-11-26 06:33:31 Re: Some optimizations for COALESCE expressions during constant folding
Previous Message Michael Paquier 2025-11-26 06:16:54 Re: Extended Statistics set/restore/clear functions.