RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-03-18 05:43:59
Message-ID: TYCPR01MB12077E2ACB82680CD7BAA22F9F52D2@TYCPR01MB12077.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Euler,

Thanks for updating the patch. Here are my comments.
I used Grammarly to proofread sentences.
(The tool strongly recommends to use active voice, but I can ignore for now)

01.

"After a successful run, the state of the target server is analagous to a fresh
logical replication setup."
a/analagous/analogous

02.

"The main difference between the logical replication setup and pg_createsubscriber
is the initial data copy."

Grammarly suggests:
"The initial data copy is the main difference between the logical replication
setup and pg_createsubscriber."

03.

"Only the synchronization phase is done, which ensures each table is brought up
to a synchronized state."

This sentence is not very clear to me. How about:
"pg_createsubscriber does only the synchronization phase, ensuring each table's
replication state is ready."

04.

"The pg_createsubscriber targets large database systems because most of the
execution time is spent making the initial data copy."

Hmm, but initial data sync by logical replication also spends most of time to
make the initial data copy. IIUC bottlenecks are a) this application must stop
and start server several times, and b) only the serial copy works. Can you
clarify them?

05.

It is better to say the internal difference between pg_createsubscriber and the
initial sync by logical replication. For example:
pg_createsubscriber uses a physical replication mechanism to ensure the standby
catches up until a certain point. Then, it converts to the standby to the
subscriber by promoting and creating subscriptions.

06.

"If these are not met an error will be reported."

Grammarly suggests:
"If these are not met, an error will be reported."

07.

"The given target data directory must have the same system identifier than the
source data directory."

Grammarly suggests:
"The given target data directory must have the same system identifier as the
source data directory."

08.

"If a standby server is running on the target data directory or it is a base
backup from the source data directory, system identifiers are the same."

This line is not needed if bullet-style is not used. The line is just a supplement,
not prerequisite.

09.

"The source server must accept connections from the target server. The source server must not be in recovery."

Grammarly suggests:
"The source server must accept connections from the target server and not be in recovery."

10.

"Publications cannot be created in a read-only cluster."

Same as 08, this line is not needed if bullet-style is not used.

11.

"pg_createsubscriber usually starts the target server with different connection
settings during the transformation steps. Hence, connections to target server
might fail."

Grammarly suggests:
"pg_createsubscriber usually starts the target server with different connection
settings during transformation. Hence, connections to the target server might fail."

12.

"During the recovery process,"

Grammarly suggests:
"During recovery,"

13.

"replicated so an error would occur."

Grammarly suggests:
"replicated, so an error would occur."

14.

"It would avoid situations in which WAL files from the source server might be
used by the target server."

Grammarly suggests:
"It would avoid situations in which the target server might use WAL files from
the source server."

15.

"a LSN"

s/a/an

16.

"of write-ahead"

s/of/of the/

17.

"specifies promote"

We can do double-quote for the word promote.

18.

"are also added so it avoids"

Grammarly suggests:
"are added to avoid"

19.

"is accepting read-write transactions"

Grammarly suggests:
"accepts read-write transactions"

20.

New options must be also documented as well. This helps not only users but also
reviewers.
(Sometimes we cannot identify that the implementation is intentinal or not.)

21.

Also, not sure the specification is good. I preferred to specify them by format
string. Because it can reduce the number of arguments and I cannot find use cases
which user want to control the name of objects.

However, your approach has a benefit which users can easily identify the generated
objects by pg_createsubscriber. How do other think?

22.

```
#define BASE_OUTPUT_DIR "pg_createsubscriber_output.d"
```

No one refers the define.

23.

```
} CreateSubscriberOptions;
...
} LogicalRepInfo;
```

Declarations after the "{" are not needed, because we do not do typedef.

22.

While seeing definitions of functions, I found that some pointers are declared
as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
changed but not the constant. Is it just missing or is there another rule?

23.

```
static int num_dbs = 0;
static int num_pubs = 0;
static int num_subs = 0;
static int num_replslots = 0;
```

I think the name is bit confusing. The number of generating publications/subscriptions/replication slots
are always same as the number of databases. They just indicate the number of
specified.

My idea is num_custom_pubs or something. Thought?

24.

```
/* standby / subscriber data directory */
static char *subscriber_dir = NULL;
```

It is bit strange that only subscriber_dir is a global variable. Caller requires
the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
main. So, how about makeing `CreateSubscriberOptions opt` to global one?

25.

```
* Replication slots, publications and subscriptions are created. Depending on
* the step it failed, it should remove the already created objects if it is
* possible (sometimes it won't work due to a connection issue).
```

I think it should be specified here that subscriptions won't be removed with the
reason.

26.

```

/*
* If the server is promoted, there is no way to use the current setup
* again. Warn the user that a new replication setup should be done before
* trying again.
*/
```

Per comment 25, we can add a reference like "See comments atop the function"

27.

usage() was not updated based on recent changes.

28.

```
if (strcmp(conn_opt->keyword, "dbname") == 0 && conn_opt->val != NULL)
{
if (dbname)
*dbname = pg_strdup(conn_opt->val);
continue;
}
```

There is a memory-leak if multiple dbname are specified in the conninfo.

29.

```
pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL)));
```

No need to initialize the seed every time. Can you reuse pg_prng_state?

30.

```
if (num_replslots == 0)
dbinfo[i].replslotname = pg_strdup(genname);
```

I think the straightforward way is to use the name of subscription if no name
is specified. This follows the rule for CREATE SUBSCRIPTION.

31.

```
/* Create replication slot on publisher */
if (lsn)
pg_free(lsn);
```

I think allocating/freeing memory is not so efficient.
Can we add a flag to create_logical_replication_slot() for controlling the
returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1)
as flag.

32.

```
/*
* Close the connection. If exit_on_error is true, it has an undesired
* condition and it should exit immediately.
*/
static void
disconnect_database(PGconn *conn, bool exit_on_error)
```

In case of disconnect_database(), the second argument should have different name.
If it is true, the process exits unconditionally.
Also, comments atop the function must be fixed.

33.

```
wal_level = strdup(PQgetvalue(res, 0, 0));
```

pg_strdup should be used here.

34.

```
{"config-file", required_argument, NULL, 1},
{"publication", required_argument, NULL, 2},
{"replication-slot", required_argument, NULL, 3},
{"subscription", required_argument, NULL, 4},
```

The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
options which do not have short notation are listed behind.

35.

```
opt.sub_port = palloc(16);
```

Per other lines, pg_alloc() should be used.

36.

```
pg_free(opt.sub_port);
```

You said that the leak won't be concerned here. If so, why only 'p' has pg_free()?

37.

```
/* Register a function to clean up objects in case of failure */
atexit(cleanup_objects_atexit);
```

Sorry if we have already discussed. I think the registration can be moved just
before the boot of the standby. Before that, the callback will be no-op.

38.

```
/* Subscriber PID file */
snprintf(pidfile, MAXPGPATH, "%s/postmaster.pid", subscriber_dir);

/*
* If the standby server is running, stop it. Some parameters (that can
* only be set at server start) are informed by command-line options.
*/
if (stat(pidfile, &statbuf) == 0)
```

Hmm. pidfile is used only here, but it is declared in main(). Can it be
separated into another funtion like is_standby_started()?

39.

Or, we may able to introcue "restart_standby_if_needed" or something.

40.

```
* XXX this code was extracted from BootStrapXLOG().
```

So, can we extract the common part to somewhere? Since system identifier is related
with the controldata file, I think it can be located in controldata_util.c.

41.

You said like below in [1], but I could not find the related fix. Can you clarify?

> That's a good point. We should state in the documentation that GUCs specified in
> the command-line options are ignored during the execution.

[1]: https://www.postgresql.org/message-id/40595e73-c7e1-463a-b8be-49792e870007%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-03-18 05:50:02 Re: pg16: XX000: could not find pathkey item to sort
Previous Message torikoshia 2024-03-18 05:39:48 Re: RFC: Logging plan of the running query