Re: speed up a logical replica setup

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-03-21 11:35:00
Message-ID: CALDaNm1Dg5tDRmaabk+ZND4WF17NrNq52WZxCE+90-PGz5trQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 21 Mar 2024 at 09:50, Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote:
>
> On 16.03.24 16:42, Euler Taveira wrote:
> > I'm attaching a new version (v30) that adds:
>
> I have some review comments and attached a patch with some smaller
> fixups (mainly message wording and avoid fixed-size string buffers).
>
>
> Thanks for your review. I'm attaching a new version (v32) that includes your
> fixups, merges the v30-0002 into the main patch [1], addresses Hayato's review[2],
> your reviews [3][4], and fixes the query for set_replication_progress() [5].
>
> * doc/src/sgml/ref/pg_createsubscriber.sgml
>
> I would remove the "How It Works" section. This is not relevant to
> users, and it is very detailed and will require updating whenever the
> implementation changes. It could be a source code comment instead.
>
>
> It uses the same structure as pg_rewind that also describes how it works
> internally. I included a separate patch that completely removes the section.
>
> * src/bin/pg_basebackup/pg_createsubscriber.c
>
> I think the connection string handling is not robust against funny
> characters, like spaces, in database names etc.
>
>
> get_base_conninfo() uses PQconninfoParse to parse the connection string. I
> expect PQconnectdb to provide a suitable error message in this case. Even if it
> builds keywords and values arrays, it is also susceptible to the same issue, no?
>
> Most SQL commands need to be amended for proper identifier or string
> literal quoting and/or escaping.
>
>
> I completely forgot about this detail when I added the new options in v30. It is
> fixed now. I also changed the tests to exercise it.
>
> In check_subscriber(): All these permissions checks seem problematic
> to me. We shouldn't reimplement our own copy of the server's
> permission checks. The server can check the permissions. And if the
> permission checking in the server ever changes, then we have
> inconsistencies to take care of. Also, the error messages "permission
> denied" are inappropriate, because we are not doing the actual thing.
> Maybe we want to do a dry-run for the benefit of the user, but then we
> should do the actual thing, like try to create a replication slot, or
> whatever. But I would rather just remove all this, it seems too
> problematic.
>
>
> The main goal of the check_* functions are to minimize error during execution.
> I removed the permission checks. The GUC checks were kept.
>
> In main(): The first check if the standby is running is problematic.
> I think it would be better to require that the standby is initially
> shut down. Consider, the standby might be running under systemd.
> This tool will try to stop it, systemd will try to restart it. Let's
> avoid these kinds of battles. It's also safer if we don't try to
> touch running servers.
>
>
> That's a good point. I hadn't found an excuse to simplify this but you provided
> one. :) There was a worry about ignoring some command-line options that changes
> GUCs if the server was started. There was also an ugly case for dry run mode
> that has to start the server (if it was running) at the end. Both cases are no
> longer issues. The current code provides a suitable error if the target server
> is running.
>
> The -p option (--subscriber-port) doesn't seem to do anything. In my
> testing, it always uses the compiled-in default port.
>
>
> It works for me. See this snippet from the regression tests. The port (50945) is
> used by pg_ctl.
>
> # Running: pg_createsubscriber --verbose --verbose --pgdata /c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata --publisher-server port=50943 host=/tmp/qpngb0bPKo dbname='pg1' --socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1 --database pg2
> pg_createsubscriber: validating connection string on publisher
> .
> .
> pg_createsubscriber: pg_ctl command is: "/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start -D "/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata" -s -o "-p 50945" -o "-c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/tmp/qpngb0bPKo'"
> 2024-03-20 18:15:24.517 -03 [105195] LOG: starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
> 2024-03-20 18:15:24.517 -03 [105195] LOG: listening on Unix socket "/tmp/qpngb0bPKo/.s.PGSQL.50945"
>
> Printing all the server log lines to the terminal doesn't seem very
> user-friendly. Not sure what to do about that, short of keeping a
> pg_upgrade-style directory of log files. But it's ugly.
>
>
> I removed the previous implementation that creates a new directory and stores
> the log file there. I don't like the pg_upgrade-style directory because (a) it
> stores part of the server log files in another place and (b) it is another
> directory to ignore if your tool handles the data directory (like a backup
> tool). My last test said it prints 35 server log lines. I expect that the user
> redirects the output to a file so he/she can inspect it later if required.

Here are a few suggestions:
1) I felt displaying the server log to the console is not a good idea,
I prefer this to be logged. There were similar comments from
Kuroda-san at [1], Peter at [2]. The number of lines will increase
based on the log level set. If you don't want to use pg_upgrade style,
how about exposing the log file option and logging it to the specified
log file.

2) Currently for publication, replication-slot and subscription, we
will have to specify these options based on the number of databases.
Here if we have 100 databases we will have to specify these options
100 times, it might not be user friendly. How about something like
what Tomas had proposed at [3] and Amit proposed at [4]. It will be
better if the user has to just specify publication, replication slot
and subscription options only one time.
+ /* Number of object names must match number of databases */
+ if (num_pubs > 0 && num_pubs != num_dbs)
+ {
+ pg_log_error("wrong number of publication names");
+ pg_log_error_hint("Number of publication names (%d)
must match number of database names (%d).",
+ num_pubs, num_dbs);
+ exit(1);
+ }

3) Can we have an option similar to dry-run which will display the
configurations required in the primary and standby node something
like:
pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres" -p 9999
-s /home/vignesh/postgres/inst/bin/ -U postgres -d db1 -d db2
--suggest-config
Suggested optimal configurations in the primary:
--------------------------------------
wallevel = logical
max_replication_slots = 3
max_wal_senders = 3
...
Suggested optimal configurations in the standby:
--------------------------------------
max_replication_slots = 3
max_wal_senders = 3
...

[1] - https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/7a970912-0b77-4942-84f7-2c9ca0bc05a5%40eisentraut.org
[3] - https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com
[4] - https://www.postgresql.org/message-id/CAA4eK1%2BL_J4GYES6g19xqfpEVD3K2NPCAeu3tzrJfZgu_Fk9Tw%40mail.gmail.com

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii.Yuki@df.MitsubishiElectric.co.jp 2024-03-21 11:37:50 RE: Partial aggregates pushdown
Previous Message Ashutosh Bapat 2024-03-21 11:27:08 Re: DOCS: add helpful partitioning links