| From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> | 
|---|---|
| To: | 'Shlok Kyal' <shlok(dot)kyal(dot)oss(at)gmail(dot)com> | 
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> | 
| Subject: | RE: How can end users know the cause of LR slot sync delays? | 
| Date: | 2025-10-31 05:59:59 | 
| Message-ID: | OSCPR01MB14966C76521B7F51B4CF5808CF5F8A@OSCPR01MB14966.jpnprd01.prod.outlook.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Dear Shlok,
Thanks for updating the patch. Few comments:
```
       The reason for the last slot synchronization skip. This field is set only
       for logical slots that are being synced from a primary server (that is,
       those whose <structfield>synced</structfield> field is
       <literal>true</literal>).
```
What happens if the slot has a skip reason and the standby is promoted?
Will the attribute be retained? If so, do we have to add some notes like "sync"?
```
+/* Update slot sync skip stats */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason,
+                                                       bool acquire_slot)
```
Let's follow existing codes; ReplicationSlotSetInactiveSince(), third argument
can be `acquire_lock`.
```
+       /*
+        * Update the slot sync related stats in pg_stat_replication_slot when a
+        * slot sync is skipped
+        */
+       if (skip_reason != SS_SKIP_NONE)
+               pgstat_report_replslot_sync_skip(slot);
```
Is it OK to call pgstat_report_replslot_sync_skip() without any locks?
```
		ReplicationSlotAcquire(NameStr(slot->data.name), true, true);
```
Can you clarify the reason error_if_invalid=true? Other codes in the file use
error_if_invalid=false.
```
+       /* Update the slot sync skip reason */
+       SpinLockAcquire(&slot->mutex);
+       if (slot->slot_sync_skip_reason != skip_reason)
+               slot->slot_sync_skip_reason = skip_reason;
+       SpinLockRelease(&slot->mutex);
```
Now the replication slot can be always acquired. Do we still have to acquire the
spinlock even for reading the value? In other words, can we move SpinLockAcquire()
and SpinLockRelease() into inside the if block?
```
# Copyright (c) 2024-2025, PostgreSQL Global Development Group
```
I think 2024 can be removed.
```
my $primary = PostgreSQL::Test::Cluster->new('publisher');
```
s/publisher/primary/.
```
# Pause steaming replication connection so that standby can lag behind
unlink($primary->data_dir . '/pg_hba.conf');
$primary->append_conf(
	'pg_hba.conf', qq{
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust
host    all             all             ::1/128                 trust
});
$primary->restart;
```
Not sure it can be called like "Pause". how about like:
```
Update pg_hba.conf and restart primar to reject streaming replication connections.
WAL records won't be replicated to the standby until .conf is restored.
```
```
# Attempt to sync replication slots while standby is behind
($result, $stdout, $stderr) =
  $standby->psql('postgres', "SELECT pg_sync_replication_slots();");
```
Can you verify the $stderr that synchornization was failed? I cannot find other
tests which checks the message. It is enough to do once.
```
$result = $standby->safe_psql(
	'postgres',
	"SELECT slot_sync_skip_reason FROM pg_replication_slots
     WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
);
is($result, 'missing_wal_record', "slot sync skip when standby is behind");
```
I found the test does twice; can we remove second one?
```
# Cleanup: drop the logical slot and ensure standby catches up
$primary->safe_psql('postgres',
	"SELECT pg_drop_replication_slot('slot_sync')");
$primary->wait_for_replay_catchup($standby);
$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
# Test for case when slot sync is skipped when the remote slot is
# behind the local slot.
$primary->safe_psql('postgres',
	"SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
);
```
Can we use reset function instead of dropping it?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Richard Guo | 2025-10-31 06:02:44 | Re: apply_scanjoin_target_to_paths and partitionwise join | 
| Previous Message | Peter Smith | 2025-10-31 05:56:31 | Re: Logical Replication of sequences |