From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Shlok Kyal' <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
Subject: | RE: How can end users know the cause of LR slot sync delays? |
Date: | 2025-09-30 12:51:58 |
Message-ID: | OSCPR01MB149667826BC79248A20999D1BF51AA@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Shlok,
Thanks for updating the patch. Here are my comments.
01.
```
+ /* Update the slot sync reason */
+ SpinLockAcquire(&slot->mutex);
+ if (slot->slot_sync_skip_reason != skip_reason)
+ slot->slot_sync_skip_reason = skip_reason;
+ SpinLockRelease(&slot->mutex);
```
Per my understanding, spinlock is acquired when the attribute on the shared memory
is updating. Can you check other parts and follow the rukle?
02.
```
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
```
Same as 1.
03.
```
```
04.
```
+#ifdef USE_INJECTION_POINTS
+ INJECTION_POINT("slot-sync-skip", NULL);
+#endif
```
No need do do #ifdef here. INJECTION_POINT() itself checks internally.
05.
```
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('publisher');
+$primary->init(
+ allows_streaming => 'logical',
+ auth_extra => [ '--create-role' => 'repl_role' ]);
```
Do we have to create repl_role? I'm not sure where it's used.
06.
```
+$primary->append_conf(
+ 'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
```
Do we have to set max_prepared_transactions? PREPARE command is not used.
07.
```
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points))
```
We must check whether the injection_points is available or not. See 047_checkpoint_physical_slot.pl.
08.
```
+# Initialize standby from primary backup
+my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
+$standby1->init_from_backup(
+ $primary, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
```
To clarify, is there a reason why we set has_restoring? Can we remove it?
09.
```
+my $connstr_1 = $primary->connstr;
```
Since this is an only connection string in the test, suffix _1 is not needed.
10.
```
+# Simulate standby connection failure by modifying pg_hba.conf
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf('pg_hba.conf',
+ qq{local all all trust}
+);
```
What if the system does not have Unix domain socket? I'm afraid all connections
could be brocked in this case.
11.
```
+# Attempt to sync replication slots while standby is behind
+$standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
```
When the command can be failed, can you set a error message from the command to
a variable? Otherwise an ERROR is output to the result file, which is suprising.
```
psql:<stdin>:1: ERROR: skipping slot synchronization because the received slot sync LSN 0/03019E58 for slot "slot_sync" is ahead of the standby position 0/030000D8
[17:23:21.495](0.384s) ok 2 - slot sync skip when standby is behind
```
12.
```
+# Restore connectivity between primary and standby
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+ 'pg_hba.conf',
+ qq{
+local all all trust
+local replication all trust
+});
```
Same as 10. Also, no need to do unlink() here.
13.
```
+# Create a new logical slot for testing injection point
+$primary->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
```
Before here can you add a description what you would test?
14.
```
# Create a physical replication slot on primary for standby
$primary->psql('postgres',
q{SELECT pg_create_physical_replication_slot('sb1_slot');});
```
Use safe_psql instead of psql().
15.
```
- if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ if (slot || (slot = SearchNamedReplicationSlot(remote_slot->name, true)))
```
Is there a possibility that slot is not NULL? It is used when confirmed_flush exceeds
the latestFlushPtr, but in this case the function cannot reach here.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2025-09-30 12:59:49 | Re: Remove custom redundant full page write description from GIN |
Previous Message | Tomas Vondra | 2025-09-30 12:42:18 | Re: Proposal: Adding compression of temporary files |