RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-02-16 12:53:43
Message-ID: TYCPR01MB12077756323B79042F29DDAEDF54C2@TYCPR01MB12077.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Euler,

Here are comments for v21.

01. main
```
/* rudimentary check for a data directory. */
...
/* subscriber PID file. */
```

Initial char must be upper, and period is not needed.

02. check_data_directory
```
snprintf(versionfile, MAXPGPATH, "%s/PG_VERSION", datadir);
```

You removed the version checking from PG_VERSION, but I think it is still needed.
Indeed v21 can detect the pg_ctl/pg_resetwal/pg_createsubscriber has different
verson, but this cannot ditect the started instance has the differnet version.
I.e., v20-0010 is partially needed.

03. store_pub_sub_info()
```
SimpleStringListCell *cell;
```

This definition can be in loop variable.

04. get_standby_sysid()
```
pfree(cf);
```

This can be pg_pfree().

05. check_subscriber
```
/* The target server must be a standby */
if (server_is_in_recovery(conn) == 0)
{
pg_log_error("The target server is not a standby");
return false;
}
```

What if the the function returns -1? Should we ditect (maybe the disconnection) here?

06. server_is_in_recovery
```
ret = strcmp("t", PQgetvalue(res, 0, 0));

PQclear(res);

if (ret == 0)
return 1;
else if (ret > 0)
return 0;
else
return -1; /* should not happen */
```

But strcmp may return a negative value, right? Based on the comment atop function,
we should not do it. I think we can use ternary operator instead.

07. server_is_in_recovery

As the fisrt place, no one consider this returns -1. So can we change the bool
function and raise pg_fatal() in case of the error?

08. check_subscriber
```
if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
{
pg_log_error("permission denied for function \"%s\"",
"pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
return false;
}
```

I think the third argument must be 2.

09. check_subscriber
```
pg_log_debug("subscriber: primary_slot_name: %s", primary_slot_name);
```

The output seems strange if the primary_slot_name is not set.

10. setup_publisher()
```
PGconn *conn;
PGresult *res;
```

Definitions can be in the loop.

11. create_publication()
```
if (PQntuples(res) == 1)
{
/*
* If publication name already exists and puballtables is true, let's
* use it. A previous run of pg_createsubscriber must have created
* this publication. Bail out.
*/
```

Hmm, but pre-existing publications may not send INSERT/UPDATE/DELETE/TRUNCATE.
They should be checked if we really want to reuse.
(I think it is OK to just raise ERROR)

12. create_publication()

Based on above, we do not have to check before creating publicatios. The publisher
can detect the duplication. I prefer it.

13. create_logical_replication_slot()
```
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
pg_log_error("could not create replication slot \"%s\" on database \"%s\": %s",
slot_name, dbinfo->dbname,
PQresultErrorMessage(res));
return lsn;
}
```

I know lsn is always NULL, but can we use `return NULL`?

14. setup_subscriber()
```
PGconn *conn;

```

This definition can be in the loop.

15.

You said in case of failure, cleanups is not needed if the process exits soon [1].
But some functions call PQfinish() then exit(1) or pg_fatal(). Should we follow?

16.

Some places refer PGresult or PGConn even after the cleanup. They must be fixed.
```
PQclear(res);
disconnect_database(conn);
pg_fatal("could not get system identifier: %s",
PQresultErrorMessage(res));
```

I think this is a root cause why sometimes the wrong error message has output.

17.

Some places call PQerrorMessage() and other places call PQresultErrorMessage().
I think it PQerrorMessage() should be used only after the connection establishment
functions. Thought?

18. 041_pg_createsubscriber_standby.pl
```
use warnings;
```

We must set "FATAL = all";

19.
```
my $node_p;
my $node_f;
my $node_s;
my $node_c;
my $result;
my $slotname;
```

I could not find forward declarations in perl file.
The node name might be bit a consuging, but I could not find better name.

20.
```
# On node P
# - create databases
# - create test tables
# - insert a row
# - create a physical replication slot
$node_p->safe_psql(
'postgres', q(
CREATE DATABASE pg1;
CREATE DATABASE pg2;
));
$node_p->safe_psql('pg1', 'CREATE TABLE tbl1 (a text)');
$node_p->safe_psql('pg1', "INSERT INTO tbl1 VALUES('first row')");
$node_p->safe_psql('pg2', 'CREATE TABLE tbl2 (a text)');
my $slotname = 'physical_slot';
$node_p->safe_psql('pg2',
"SELECT pg_create_physical_replication_slot('$slotname')");
```

I think setting of the same node can be gathered into one place.
Also, any settings and definitions should be done just before they are used.

21.
```
$node_s->append_conf(
'postgresql.conf', qq[
log_min_messages = debug2
primary_slot_name = '$slotname'
]);
```

I could not find a reason why we set to debug2.

22.
```
command_fails
```

command_checks_all() can check returned value and outputs.
Should we use it?

23.

Can you add headers for each testcases? E.g.,

```
# ------------------------------
# Check pg_createsubscriber fails when the target server is not a
# standby of the source.
...
# ------------------------------
# Check pg_createsubscriber fails when the target server is not running
...
# ------------------------------
# Check pg_createsubscriber fails when the target server is a member of
# the cascading standby.
...
# ------------------------------
# Check successful dry-run
...
# ------------------------------
# Check successful conversion
```

24.
```
# Stop node C
$node_c->teardown_node;
...
$node_p->stop;
$node_s->stop;
$node_f->stop;
```
Why you choose the teardown?

25.

The creation of subscriptions are not directory tested. @subnames contains the
name of subscriptions, but it just assumes the number of them is more than two.

Since it may be useful, I will post top-up patch on Monday, if there are no updating.

[1]: https://www.postgresql.org/message-id/89ccf72b-59f9-4317-b9fd-1e6d20a0c3b1%40app.fastmail.com
[2]: https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-02-16 12:54:16 Re: POC, WIP: OR-clause support for indexes
Previous Message Bharath Rupireddy 2024-02-16 12:53:00 Re: A proposal to provide a timeout option for CREATE_REPLICATION_SLOT/pg_create_logical_replication_slot