RE: How can end users know the cause of LR slot sync delays?

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Shlok Kyal' <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Subject: RE: How can end users know the cause of LR slot sync delays?
Date: 2025-10-31 05:59:59
Message-ID: OSCPR01MB14966C76521B7F51B4CF5808CF5F8A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Shlok,

Thanks for updating the patch. Few comments:

```
The reason for the last slot synchronization skip. This field is set only
for logical slots that are being synced from a primary server (that is,
those whose <structfield>synced</structfield> field is
<literal>true</literal>).
```

What happens if the slot has a skip reason and the standby is promoted?
Will the attribute be retained? If so, do we have to add some notes like "sync"?

```
+/* Update slot sync skip stats */
+static void
+update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason,
+ bool acquire_slot)
```

Let's follow existing codes; ReplicationSlotSetInactiveSince(), third argument
can be `acquire_lock`.

```
+ /*
+ * Update the slot sync related stats in pg_stat_replication_slot when a
+ * slot sync is skipped
+ */
+ if (skip_reason != SS_SKIP_NONE)
+ pgstat_report_replslot_sync_skip(slot);
```

Is it OK to call pgstat_report_replslot_sync_skip() without any locks?

```
ReplicationSlotAcquire(NameStr(slot->data.name), true, true);
```

Can you clarify the reason error_if_invalid=true? Other codes in the file use
error_if_invalid=false.

```
+ /* Update the slot sync skip reason */
+ SpinLockAcquire(&slot->mutex);
+ if (slot->slot_sync_skip_reason != skip_reason)
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
```

Now the replication slot can be always acquired. Do we still have to acquire the
spinlock even for reading the value? In other words, can we move SpinLockAcquire()
and SpinLockRelease() into inside the if block?

```
# Copyright (c) 2024-2025, PostgreSQL Global Development Group
```

I think 2024 can be removed.

```
my $primary = PostgreSQL::Test::Cluster->new('publisher');
```

s/publisher/primary/.

```
# Pause steaming replication connection so that standby can lag behind
unlink($primary->data_dir . '/pg_hba.conf');
$primary->append_conf(
'pg_hba.conf', qq{
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
});
$primary->restart;
```

Not sure it can be called like "Pause". how about like:
```
Update pg_hba.conf and restart primar to reject streaming replication connections.
WAL records won't be replicated to the standby until .conf is restored.
```

```
# Attempt to sync replication slots while standby is behind
($result, $stdout, $stderr) =
$standby->psql('postgres', "SELECT pg_sync_replication_slots();");
```

Can you verify the $stderr that synchornization was failed? I cannot find other
tests which checks the message. It is enough to do once.

```
$result = $standby->safe_psql(
'postgres',
"SELECT slot_sync_skip_reason FROM pg_replication_slots
WHERE slot_name = 'slot_sync' AND synced AND NOT temporary"
);
is($result, 'missing_wal_record', "slot sync skip when standby is behind");
```

I found the test does twice; can we remove second one?

```
# Cleanup: drop the logical slot and ensure standby catches up
$primary->safe_psql('postgres',
"SELECT pg_drop_replication_slot('slot_sync')");
$primary->wait_for_replay_catchup($standby);

$standby->safe_psql('postgres', "SELECT pg_sync_replication_slots();");

# Test for case when slot sync is skipped when the remote slot is
# behind the local slot.
$primary->safe_psql('postgres',
"SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
);
```

Can we use reset function instead of dropping it?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-10-31 06:02:44 Re: apply_scanjoin_target_to_paths and partitionwise join
Previous Message Peter Smith 2025-10-31 05:56:31 Re: Logical Replication of sequences