Re: speed up a logical replica setup

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-02-06 06:40:47
Message-ID: CAHv8Rj+AvtnxNyyMAFRVuuzP3Cy5j1zq99wkMUXOUnskPZg5nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 2, 2024 at 3:11 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Euler,
>
> Thanks for updating the patch!
>
> >
> I'm still working on the data structures to group options. I don't like the way
> it was grouped in v13-0005. There is too many levels to reach database name.
> The setup_subscriber() function requires the 3 data structures.
> >
>
> Right, your refactoring looks fewer stack. So I pause to revise my refactoring
> patch.
>
> >
> The documentation update is almost there. I will include the modifications in
> the next patch.
> >
>
> OK. I think it should be modified before native speakers will attend to the
> thread.
>
> >
> Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
> However, it comes with a restriction that requires a similar HBA rule for both
> regular and replication connections. Is it an acceptable restriction? We might
> paint ourselves into the corner. A reasonable proposal is not to remove this
> option. Instead, it should be optional. If it is not provided, primary_conninfo
> is used.
> >
>
> I didn't have such a point of view. However, it is not related whether -P exists
> or not. Even v14-0001 requires primary to accept both normal/replication connections.
> If we want to avoid it, the connection from pg_createsubscriber can be restored
> to replication-connection.
> (I felt we do not have to use replication protocol even if we change the connection mode)
>
> The motivation why -P is not needed is to ensure the consistency of nodes.
> pg_createsubscriber assumes that the -P option can connect to the upstream node,
> but no one checks it. Parsing two connection strings may be a solution but be
> confusing. E.g., what if some options are different?
> I think using a same parameter is a simplest solution.
>
> And below part contains my comments for v14.
>
> 01.
> ```
> char temp_replslot[NAMEDATALEN] = {0};
> ```
>
> I found that no one refers the name of temporary slot. Can we remove the variable?
>
> 02.
> ```
> CreateSubscriberOptions opt;
> ...
> memset(&opt, 0, sizeof(CreateSubscriberOptions));
>
> /* Default settings */
> opt.subscriber_dir = NULL;
> opt.pub_conninfo_str = NULL;
> opt.sub_conninfo_str = NULL;
> opt.database_names = (SimpleStringList)
> {
> NULL, NULL
> };
> opt.retain = false;
> opt.recovery_timeout = 0;
> ```
>
> Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
> All values are set to 0x0.
>
> 03.
> ```
> /*
> * Is the standby server ready for logical replication?
> */
> static bool
> check_subscriber(LogicalRepInfo *dbinfo)
> ```
>
> You said "target server must be a standby" in [1], but I cannot find checks for it.
> IIUC, there are two approaches:
> a) check the existence "standby.signal" in the data directory
> b) call an SQL function "pg_is_in_recovery"
>
> 04.
> ```
> static char *pg_ctl_path = NULL;
> static char *pg_resetwal_path = NULL;
> ```
>
> I still think they can be combined as "bindir".
>
> 05.
>
> ```
> /*
> * Write recovery parameters.
> ...
> WriteRecoveryConfig(conn, opt.subscriber_dir, recoveryconfcontents);
> ```
>
> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").
>
> 06.
> ```
> static LogicalRepInfo *store_pub_sub_info(SimpleStringList dbnames, const char *pub_base_conninfo, const char *sub_base_conninfo);
> ...
> static void modify_subscriber_sysid(const char *pg_resetwal_path, CreateSubscriberOptions opt);
> ...
> static void wait_for_end_recovery(const char *conninfo, CreateSubscriberOptions opt);
> ```
>
> Functions arguments should not be struct because they are passing by value.
> They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery,
> we can pass a value which would be really used.
>
> 07.
> ```
> static char *get_base_conninfo(char *conninfo, char *dbname,
> const char *noderole);
> ```
>
> Not sure noderole should be passed here. It is used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)
>
> 08.
> The terminology is still not consistent. Some functions call the target as standby,
> but others call it as subscriber.
>
> 09.
> v14 does not work if the standby server has already been set recovery_target*
> options. PSA the reproducer. I considered two approaches:
>
> a) raise an ERROR when these parameter were set. check_subscriber() can do it
> b) overwrite these GUCs as empty strings.
>
> 10.
> The execution always fails if users execute --dry-run just before. Because
> pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal
> use-case, so current implementation seems not user-friendly. How should we fix?
> Below bullets are my idea:
>
> a) avoid stopping the standby in case of dry_run: seems possible.
> b) accept even if the standby is stopped: seems possible.
> c) start the standby at the end of run: how arguments like pg_ctl -l should be specified?
>
> My top-up patches fixes some issues.
>
> v15-0001: same as v14-0001
> === experimental patches ===
> v15-0002: Use replication connections when we connects to the primary.
> Connections to standby is not changed because the standby/subscriber
> does not require such type of connection, in principle.
> If we can accept connecting to subscriber with replication mode,
> this can be simplified.
> v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
> v15-0004: Check whether the target is really standby. This is done by pg_is_in_recovery()
> v15-0005: Avoid stopping/starting standby server in dry_run mode.
> I.e., approach a). in #10 is used.
> v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.
>
> [1]: https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com

While reviewing the v15 patches I discovered that subscription
connection string has added a lot of options which are not required
now:
v15-0001
postgres=# select subname, subconninfo from pg_subscription;
subname | subconninfo
-------------------------------+------------------------------------------
pg_createsubscriber_5_1867633 | host=localhost port=5432 dbname=postgres
(1 row)

v15-0001+0002+0003
postgres=# select subname, subconninfo from pg_subscription;
subname |

subconninfo

-------------------------------+------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
------------------------------
pg_createsubscriber_5_1895366 | user=shubham
passfile='/home/shubham/.pgpass' channel_binding=prefer ho
st=127.0.0.1 port=5432 sslmode=prefer sslcompression=0
sslcertmode=allow sslsni=1 ssl_min_protocol_versi
on=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0
target_session_attrs=any load_balance_
hosts=disable dbname=postgres
(1 row)

Here, we can see that channel_binding, sslmode, sslcertmode, sslsni,
gssencmode, krbsrvname, etc are getting included. This does not look
intentional, we should keep the subscription connection same as in
v15-0001.

Thanks and Regards,
Shubham Khanna.

Thanks and Regards,
Shubham Khanna.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-06 06:48:05 Re: Why is subscription/t/031_column_list.pl failing so much?
Previous Message Peter Smith 2024-02-06 06:38:10 Re: Synchronizing slots from primary to standby