RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-01-12 10:31:01
Message-ID: TY3PR01MB9889C362FF76102C88FA1C29F56F2@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Euler,

Here are comments for your v5 patch.

01.
In the document, two words target/standby are used as almost the same meaning.
Can you unify them?

02.
```
<refsynopsisdiv>
<cmdsynopsis>
<command>pg_subscriber</command>
<arg rep="repeat"><replaceable>option</replaceable></arg>
</cmdsynopsis>
</refsynopsisdiv>
```

There are some mandatory options like -D/-S/-P. It must be listed in Synopsis chapter.

03.
```
<para>
<application>pg_subscriber</application> takes the publisher and subscriber
connection strings, a cluster directory from a standby server and a list of
database names and it sets up a new logical replica using the physical
recovery process.
</para>
```

I briefly checked other pages and they do not describe accepted options here.
A summary of the application should be mentioned. Based on that, how about:

```
pg_subscriber creates a new <link linkend="logical-replication-subscription">
subscriber</link> from a physical standby server. This allows users to quickly
set up logical replication system.
```

04.
```
<para>
The <application>pg_subscriber</application> should be run at the target
server. The source server (known as publisher server) should accept logical
replication connections from the target server (known as subscriber server).
The target server should accept local logical replication connection.
</para>
```

I'm not native speaker, but they are not just recommmendations - they are surely
required. So, should we replace s/should/has to/?

05.
```
<varlistentry>
<term><option>-S <replaceable class="parameter">conninfo</replaceable></option></term>
<term><option>--subscriber-conninfo=<replaceable class="parameter">conninfo</replaceable></option></term>
<listitem>
<para>
The connection string to the subscriber. For details see <xref linkend="libpq-connstring"/>.
</para>
</listitem>
</varlistentry>
```

I became not sure whether it is "The connection string to the subscriber.".
The server is still physical standby at that time.

06.
```
* IDENTIFICATION
* src/bin/pg_subscriber/pg_subscriber.c
```

The identification is not correct.

07.
I felt that there were too many global variables and LogicalRepInfo should be
refactored. Because...

* Some info related with clusters(e.g., subscriber_dir, conninfo, ...) should be
gathered in one struct.
* pubconninfo/subsconninfo are stored per db, but it is not needed if we have
one base_conninfo.
* pubname/subname are not needed because we have fixed naming rule.
* pg_ctl_path and pg_resetwal_path can be conbimed into one bindir.
* num_dbs should not be alone.
...

Based on above, how about using structures like below?

```
typedef struct LogicalRepPerdbInfo
{
Oid oid;
char *dbname;
bool made_replslot; /* replication slot was created */
bool made_publication; /* publication was created */
bool made_subscription; /* subscription was created */
} LogicalRepPerdbInfo;

typedef struct PrimaryInfo
{
char *base_conninfo;
bool made_transient_replslot;
} PrimaryInfo;

typedef struct StandbyInfo
{
char *base_conninfo;
char *bindir;
char *pgdata;
char *primary_slot_name;
} StandbyInfo;

typedef struct LogicalRepInfo
{
int num_dbs;
LogicalRepPerdbInfo *perdb;
PrimaryInfo *primary;
StandbyInfo *standby;
} LogicalRepInfo;
```

08.
```
char *subconninfo; /* subscription connection string for logical
* replication */
```

Not sure how we should notate because the target has not been subscriber yet.

09.
```
enum WaitPMResult
{
POSTMASTER_READY,
POSTMASTER_STANDBY,
POSTMASTER_STILL_STARTING,
POSTMASTER_FAILED
};
```

This enum has been already defined in pg_ctl.c. Not sure we can use the same name.
Can we rename to PGSWaitPMResult. or export pre-existing one?

10.
```
/* Options */
static const char *progname;
```

I think it is not an option.

11.
```
/*
* Validate a connection string. Returns a base connection string that is a
* connection string without a database name plus a fallback application name.
* Since we might process multiple databases, each database name will be
* appended to this base connection string to provide a final connection string.
* If the second argument (dbname) is not null, returns dbname if the provided
* connection string contains it. If option --database is not provided, uses
* dbname as the only database to setup the logical replica.
* It is the caller's responsibility to free the returned connection string and
* dbname.
*/
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
```
Just FYI - adding fallback_application_name may be too optimisitic. Currently
the output was used by both pg_subscriber and subscription connection.

12.
Can we add an option not to remove log files even operations were succeeded.

13.
```
/*
* Since the standby server is running, check if it is using an
* existing replication slot for WAL retention purposes. This
* replication slot has no use after the transformation, hence, it
* will be removed at the end of this process.
*/
primary_slot_name = use_primary_slot_name();
```

Now primary_slot_name is checked only when the server have been started, but
it should be checked in any cases.

14.
```
consistent_lsn = create_logical_replication_slot(conn, &dbinfo[0],
temp_replslot);
```

Can we create a temporary slot here?

15.
I found that subscriptions cannot be started if tuples are inserted on publisher
after creating temp_replslot. After starting a subscriber, I got below output on the log.

```
ERROR: could not receive data from WAL stream: ERROR: publication "pg_subscriber_5" does not exist
CONTEXT: slot "pg_subscriber_5_3632", output plugin "pgoutput", in the change callback, associated LSN 0/30008A8
LOG: background worker "logical replication apply worker" (PID 3669) exited with exit code 1
```

But this is strange. I confirmed that the specified publication surely exists.
Do you know the reason?

```
publisher=# SELECT pubname FROM pg_publication;
pubname
-----------------
pg_subscriber_5
(1 row)
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-01-12 10:32:26 Re: tablecmds.c/MergeAttributes() cleanup
Previous Message Christoph Berg 2024-01-12 10:14:28 plperl and perl 5.38