Re: speed up a logical replica setup

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: speed up a logical replica setup
Date: 2022-03-15 13:51:11
Message-ID: b9aa614c-84ba-a869-582f-8d5e3ab57424@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.02.22 13:09, Euler Taveira wrote:
> A new tool called pg_subscriber does this conversion and is tightly
> integrated
> with Postgres.

Are we comfortable with the name pg_subscriber? It seems too general.
Are we planning other subscriber-related operations in the future? If
so, we should at least make this one use a --create option or
something like that.

doc/src/sgml/ref/pg_subscriber.sgml

Attached is a patch that reorganizes the man page a bit. I moved the
description of the steps to the Notes section and formatted it
differently. I think the steps are interesting but not essential for
the using of the program, so I wanted to get them out of the main
description.

src/bin/pg_subscriber/pg_subscriber.c

+ if (made_temp_replslot)
+ {
+ conn = connect_database(dbinfo[0].pubconninfo, true);
+ drop_replication_slot(conn, &dbinfo[0], temp_replslot);
+ disconnect_database(conn);
+ }

Temp slots don't need to be cleaned up.

+/*
+ * Obtain the system identifier from control file. It will be used to
compare
+ * if a data directory is a clone of another one. This routine is used
locally
+ * and avoids a replication connection.
+ */
+static char *
+get_control_from_datadir(const char *datadir)

This could return uint64 directly, without string conversion.
get_sysid_from_conn() could then convert to uint64 internally.

+ {"verbose", no_argument, NULL, 'v'},

I'm not sure if the --verbose option is all that useful.

+ {"stop-subscriber", no_argument, NULL, 1},

This option doesn't seem to be otherwise supported or documented.

+ pub_sysid = pg_malloc(32);
+ pub_sysid = get_sysid_from_conn(dbinfo[0].pubconninfo);
+ sub_sysid = pg_malloc(32);
+ sub_sysid = get_control_from_datadir(subscriber_dir);

These mallocs don't appears to be of any use.

+ dbname_conninfo = pg_malloc(NAMEDATALEN);

This seems wrong.

Overall, this code could use a little bit more structuring. There are
a lot of helper functions that don't seem to do a lot and are mostly
duplicate runs-this-SQL-command calls. But the main() function is
still huge. There is room for refinement.

src/bin/pg_subscriber/t/001_basic.pl

Good start, but obviously, we'll need some real test cases here also.

src/bin/initdb/initdb.c
src/bin/pg_ctl/pg_ctl.c
src/common/file_utils.c
src/include/common/file_utils.h

I recommend skipping this refactoring. The readfile() function from
pg_ctl is not general enough to warrant the pride of place of a
globally available function. Note that it is specifically geared
toward some of pg_ctl's requirements, for example that the underlying
file can change while it is being read.

The requirements of pg_subscriber can be satisfied more easily: Just
call pg_ctl to start the server. You are already using that in
pg_subscriber. Is there a reason it can't be used here as well?

Attachment Content-Type Size
0001-fixup-Create-a-new-logical-replica-from-a-base-backu.patch text/plain 8.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2022-03-15 14:00:52 RE: Skipping logical replication transactions on subscriber side
Previous Message Dong Wook Lee 2022-03-15 13:50:25 Re: Add pg_freespacemap extension sql test