RE: Synchronizing slots from primary to standby

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'shveta malik' <shveta(dot)malik(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-12-21 13:07:46
Message-ID: TY3PR01MB98893274D5A4FD4F86CC04A0F595A@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Shveta,

Thanks for updating the patch! Here is my comments for v52-0002.

~~~~~
system-views.sgml

01.

```
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>sync_state</structfield> <type>char</type>
+ </para>
+ <para>
+ Defines slot synchronization state. This is meaningful on the physical
+ standby which has configured <xref linkend="guc-enable-syncslot"/> = true.
+ Possible values are:
+ <itemizedlist>
+ <listitem>
+ <para><literal>n</literal> = none for user created slots,
...
```

Hmm. I'm not sure why we must show a single character to a user. I'm OK for
pg_subscription.srsubstate because it is a "catalog" - the actual value would be
recorded in the heap. But pg_replication_slot is just a view so that we can replace
internal representations to other strings. E.g., pg_replication_slots.wal_status.
How about using {none, initialized, ready} or something?

~~~~~
postmaster.c

02. bgworker_should_start_now

```
+ if (start_time == BgWorkerStart_ConsistentState_HotStandby &&
+ pmState != PM_RUN)
+ return true;
```

I'm not sure the second condition is really needed. The line will be executed when
pmState is PM_HOT_STANDBY. Is there a possibility that pmState is changed around here?

~~~~~
libpqwalreceiver.c

03. PQWalReceiverFunctions

```
+ .walrcv_get_dbname_from_conninfo = libpqrcv_get_dbname_from_conninfo,
```

Just to confirm - is there a rule for ordering?

~~~~~
slotsync.c

04. SlotSyncWorkerCtx

```
typedef struct SlotSyncWorkerCtx
{
pid_t pid;
slock_t mutex;
} SlotSyncWorkerCtx;

SlotSyncWorkerCtx *SlotSyncWorker = NULL;
```

Per other files like launcher.c, should we use a name like "SlotSyncWorkerCtxStruct"?

05. SlotSyncWorkerRegister()

Your coding will work well, but there is another approach which validates
slotsync parameters here. In this case, the postmaster should exit ASAP. This can
notify that there are some wrong settings to users earlier. Thought?

06. wait_for_primary_slot_catchup

```
+ CHECK_FOR_INTERRUPTS();
+
+ /* Handle any termination request if any */
+ ProcessSlotSyncInterrupts(wrconn);
```

ProcessSlotSyncInterrupts() also has CHECK_FOR_INTERRUPTS(), so no need to call.

07. wait_for_primary_slot_catchup

```
+ /*
+ * XXX: Is waiting for 2 seconds before retrying enough or more or
+ * less?
+ */
+ rc = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ 2000L,
+ WAIT_EVENT_REPL_SLOTSYNC_PRIMARY_CATCHUP);
+
+ ResetLatch(MyLatch);
+
+ /* Emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
```

Is there any reasons not to use WL_EXIT_ON_PM_DEATH event? If not, you can use.

08. synchronize_slots

```
+ SpinLockAcquire(&WalRcv->mutex);
+ if (!WalRcv ||
+ (WalRcv->slotname[0] == '\0') ||
+ XLogRecPtrIsInvalid(WalRcv->latestWalEnd))
+ {
...
```

Assuming that WalRcv is still NULL. In this case, does the first SpinLockAcquire()
lead a segmentation fault?

09. synchronize_slots

```
+ elog(DEBUG2, "slot sync worker's query:%s \n", s.data);
```

The query is not dynamical one, so I think no need to print even if the debug
mode.

10. synchronize_one_slot

IIUC, this function can synchronize slots even if the used plugin on primary is
not installed on the secondary server. If the slot is created by the slotsync
worker, users will recognize it after the server is promoted and the decode is
starting. I felt it is not good specification. Can we detect in the validation
phase?

~~~~~
not the source code

11.

I tested the typical case - promoting a publisher from a below diagram.
A physical replication slot "physical" was specified as standby_slot_names.

```
node A (primary) --> node B (secondary)
|
|
node C (subscriber)
```

And after the promoting, below lines were periodically output on logfiles for
node B and C.

```
WARNING: replication slot "physical" specified in parameter "standby_slot_names" does not exist, ignoring
```

Do you have idea to suppress the warning? IIUC it is a normal behavior of the
walsender so that we cannot avoid the periodical outputs.

The steps of the test was as follows:

1. stop the node A via pg_ctl stop
2. promota the node B via pg_ctl promote
3. change the connection string of the subscription via ALTER SUBSCRIPTION ... CONNECTION ...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2023-12-21 13:14:19 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Previous Message Pavel Stehule 2023-12-21 13:03:38 Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation