From: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: How can end users know the cause of LR slot sync delays? |
Date: | 2025-09-03 12:22:21 |
Message-ID: | CAE9k0Pmh86ctxaOQ0QZkt0gmg+pJbu34w-maG=NoJXfbR80hoA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Amit,
On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
> wrote:
> >
> > We have seen cases where slot synchronization gets delayed, for example
> when the slot is behind the failover standby or vice versa, and the slot
> sync worker has to wait for one to catch up with the other. During this
> waiting period, users querying pg_replication_slots can only see whether
> the slot has been synchronized or not. If it has already synchronized,
> that’s fine, but if synchronization is taking longer, users would naturally
> want to understand the reason for the delay.
> >
> > Is there a way for end users to know the cause of slot synchronization
> delays, so they can take appropriate actions to speed it up?
> >
> > I understand that server logs are emitted in such cases, but logs are
> not something end users would want to check regularly. Moreover, since
> logging is configuration-based, relevant messages may sometimes be skipped
> or suppressed.
> >
>
> Currently, the way to see the reason for sync skip is LOGs but I think
> it is better to add a new column like sync_skip_reason in
> pg_replication_slots. This can show the reasons like
> standby_LSN_ahead_remote_LSN. I think ideally users can compare
> standby's slot LSN/XMIN with remote_slot being synced. Do you have any
> better ideas?
>
>
I have similar thoughts, but for clarity, I’d like to outline some of the
key steps I plan to take:
Step 1: Define an enum for all possible reasons a slot persistence was
skipped.
/*
* Reasons why a replication slot sync was skipped.
*/
typedef enum ReplicationSlotSyncSkipReason
{
RS_SYNC_SKIP_NONE = 0, /* No skip */
RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind local
reserved LSN */
RS_SYNC_SKIP_DATA_LOSS = (1 << 1), /* Local slot ahead of remote,
risk of data loss */
RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2) /* Standby could not build a
consistent snapshot */
} ReplicationSlotSyncSkipReason;
--
Step 2: Introduce new column to pg_replication_slots to store the skip
reason
/* Inside pg_replication_slots table */
ReplicationSlotSyncSkipReason slot_sync_skip_reason;
--
Step 3: Function to convert enum to human-readable string that can be
stored in pg_replication_slots.
/*
* Convert ReplicationSlotSyncSkipReason bitmask to human-readable string.
*
* Returns a palloc'd string; caller is responsible for freeing it.
*/
static char *
replication_slot_sync_skip_reason_str(ReplicationSlotSyncSkipReason reason)
{
StringInfoData buf;
initStringInfo(&buf);
if (reason == RS_SYNC_SKIP_NONE)
{
appendStringInfoString(&buf, "none");
return buf.data;
}
if (reason & RS_SYNC_SKIP_REMOTE_BEHIND)
appendStringInfoString(&buf, "remote_behind|");
if (reason & RS_SYNC_SKIP_DATA_LOSS)
appendStringInfoString(&buf, "data_loss|");
if (reason & RS_SYNC_SKIP_NO_SNAPSHOT)
appendStringInfoString(&buf, "no_snapshot|");
/* Remove trailing '|' */
if (buf.len > 0 && buf.data[buf.len - 1] == '|')
buf.data[buf.len - 1] = '\0';
return buf.data;
}
--
Step 4: Capture slot_sync_skip_reason whenever the relevant LOG messages
are generated, primarily inside update_local_synced_slot or
update_and_persist_local_synced_slot. This value will can later be
persisted in the pg_replication_slots catalog.
--
Please let me know if you have any objections. I’ll share the wip patch in
a few days.
--
With Regards,
Ashutosh Sharma.
From | Date | Subject | |
---|---|---|---|
Next Message | Doruk Yilmaz | 2025-09-03 12:43:25 | Re: [Patch] add new parameter to pg_replication_origin_session_setup |
Previous Message | Florents Tselai | 2025-09-03 12:16:35 | Re: split func.sgml to separated individual sgml files |