RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-02-02 09:41:35
Message-ID: TY3PR01MB98896B904BB259B559622773F5422@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Euler,

Thanks for updating the patch!

>
I'm still working on the data structures to group options. I don't like the way
it was grouped in v13-0005. There is too many levels to reach database name.
The setup_subscriber() function requires the 3 data structures.
>

Right, your refactoring looks fewer stack. So I pause to revise my refactoring
patch.

>
The documentation update is almost there. I will include the modifications in
the next patch.
>

OK. I think it should be modified before native speakers will attend to the
thread.

>
Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
However, it comes with a restriction that requires a similar HBA rule for both
regular and replication connections. Is it an acceptable restriction? We might
paint ourselves into the corner. A reasonable proposal is not to remove this
option. Instead, it should be optional. If it is not provided, primary_conninfo
is used.
>

I didn't have such a point of view. However, it is not related whether -P exists
or not. Even v14-0001 requires primary to accept both normal/replication connections.
If we want to avoid it, the connection from pg_createsubscriber can be restored
to replication-connection.
(I felt we do not have to use replication protocol even if we change the connection mode)

The motivation why -P is not needed is to ensure the consistency of nodes.
pg_createsubscriber assumes that the -P option can connect to the upstream node,
but no one checks it. Parsing two connection strings may be a solution but be
confusing. E.g., what if some options are different?
I think using a same parameter is a simplest solution.

And below part contains my comments for v14.

01.
```
char temp_replslot[NAMEDATALEN] = {0};
```

I found that no one refers the name of temporary slot. Can we remove the variable?

02.
```
CreateSubscriberOptions opt;
...
memset(&opt, 0, sizeof(CreateSubscriberOptions));

/* Default settings */
opt.subscriber_dir = NULL;
opt.pub_conninfo_str = NULL;
opt.sub_conninfo_str = NULL;
opt.database_names = (SimpleStringList)
{
NULL, NULL
};
opt.retain = false;
opt.recovery_timeout = 0;
```

Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
All values are set to 0x0.

03.
```
/*
* Is the standby server ready for logical replication?
*/
static bool
check_subscriber(LogicalRepInfo *dbinfo)
```

You said "target server must be a standby" in [1], but I cannot find checks for it.
IIUC, there are two approaches:
a) check the existence "standby.signal" in the data directory
b) call an SQL function "pg_is_in_recovery"

04.
```
static char *pg_ctl_path = NULL;
static char *pg_resetwal_path = NULL;
```

I still think they can be combined as "bindir".

05.

```
/*
* Write recovery parameters.
...
WriteRecoveryConfig(conn, opt.subscriber_dir, recoveryconfcontents);
```

WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
sure it is good. These settings would remain on new subscriber even after the
pg_createsubscriber. Can we avoid it? I come up with passing these parameters
via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
(all GUCs must be allign like "-c XXX -c XXX -c XXX...").

06.
```
static LogicalRepInfo *store_pub_sub_info(SimpleStringList dbnames, const char *pub_base_conninfo, const char *sub_base_conninfo);
...
static void modify_subscriber_sysid(const char *pg_resetwal_path, CreateSubscriberOptions opt);
...
static void wait_for_end_recovery(const char *conninfo, CreateSubscriberOptions opt);
```

Functions arguments should not be struct because they are passing by value.
They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
we can pass a value which would be really used.

07.
```
static char *get_base_conninfo(char *conninfo, char *dbname,
const char *noderole);
```

Not sure noderole should be passed here. It is used only for the logging.
Can we output string before calling the function?
(The parameter is not needed anymore if -P is removed)

08.
The terminology is still not consistent. Some functions call the target as standby,
but others call it as subscriber.

09.
v14 does not work if the standby server has already been set recovery_target*
options. PSA the reproducer. I considered two approaches:

a) raise an ERROR when these parameter were set. check_subscriber() can do it
b) overwrite these GUCs as empty strings.

10.
The execution always fails if users execute --dry-run just before. Because
pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
use-case, so current implementation seems not user-friendly. How should we fix?
Below bullets are my idea:

a) avoid stopping the standby in case of dry_run: seems possible.
b) accept even if the standby is stopped: seems possible.
c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?

My top-up patches fixes some issues.

v15-0001: same as v14-0001
=== experimental patches ===
v15-0002: Use replication connections when we connects to the primary.
Connections to standby is not changed because the standby/subscriber
does not require such type of connection, in principle.
If we can accept connecting to subscriber with replication mode,
this can be simplified.
v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
v15-0005: Avoid stopping/starting standby server in dry_run mode.
I.e., approach a). in #10 is used.
v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.

[1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment Content-Type Size
v15-0001-Creates-a-new-logical-replica-from-a-standby-ser.patch application/octet-stream 78.2 KB
v15-0002-Use-replication-connection-when-we-connect-to-th.patch application/octet-stream 5.2 KB
v15-0003-Remove-P-and-use-primary_conninfo-instead.patch application/octet-stream 13.0 KB
v15-0004-Check-whether-the-target-is-really-standby.patch application/octet-stream 1.2 KB
v15-0005-Avoid-stopping-starting-standby-server-in-dry_ru.patch application/octet-stream 2.6 KB
v15-0006-Overwrite-recovery-parameters.patch application/octet-stream 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-02-02 09:47:59 Re: Emitting JSON to file using COPY TO
Previous Message John Naylor 2024-02-02 09:21:01 Re: Change GUC hashtable to use simplehash?