From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Shlok Kyal' <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
Subject: | RE: How can end users know the cause of LR slot sync delays? |
Date: | 2025-09-17 10:54:39 |
Message-ID: | OSCPR01MB14966A618A8C61EC3DEE486A4F517A@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Shlok,
Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
indicate the current status of synchronization, it just shows the history.
I feel approach2 has more information than approach1.
Below are my comments for your patch.
01.
```
+ Number of times the slot sync is skipped.
```
"slot sync" should be "slot synchronization". Same thing can be saied for other
attributes.
02.
```
+ Reason of the last slot sync skip.
```
Possible values must be clarified.
03.
```
+ s.slot_sync_skip_count,
+ s.last_slot_sync_skip,
+ s.slot_sync_skip_reason,
```
Last line has tab-blank but others have space-blank.
04.
```
+typedef enum SlotSyncSkipReason
+{
+ SLOT_SYNC_SKIP_NONE, /* No skip */
+ SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */
+ SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */
+ SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a
+ * consistent snapshot */
+} SlotSyncSkipReason
```
a.
Can we add comment atop the enum?
b.
SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has
not received WALs required by slots, but enum does not clarify it.
How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very
welcome.
c
s/reach/build/.
05.
```
@@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
remote_slot->name,
LSN_FORMAT_ARGS(latestFlushPtr)));
- return false;
+ /* If the slot is not present on the local */
+ if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ return false;
}
```
Looks like if users try to sync slots via SQL interfaces, the statistics cannot
be updated.
06. update_slot_sync_skip_stats
```
+ /* Update the slot sync reason */
+ SpinLockAcquire(&slot->mutex);
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
```
I feel no need to update the reason if the slot->slot_sync_skip_reason
and skip_reason are the same.
07. synchronize_one_slot
```
+ /*
+ * If standby is behind remote slot and the synced slot is present on
+ * local.
+ */
+ if (remote_slot->confirmed_lsn > latestFlushPtr)
+ {
+ if (synced)
+ update_slot_sync_skip_stats(slot, skip_reason);
+ return false;
+ }
```
This condition exist in the same function; can we combine?
08. GetSlotSyncSkipReason()
Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(),
and it does not use.
09.
Can you consider a test for added codes?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2025-09-17 10:58:33 | Re: PG 18 release notes draft committed |
Previous Message | Etsuro Fujita | 2025-09-17 10:53:05 | Re: someone else to do the list of acknowledgments |