Re: speed up a logical replica setup

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-01-01 10:14:21
Message-ID: CALDaNm098Jkbh+ye6zMj9Ro9j1bBe6FfPV80BFbs1=pUuTJ07g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 6 Dec 2023 at 12:53, Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point? All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work. Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn. Nobody has complained about any of the other
> > tools in there.
>
> Not sure. We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
> connection issue.
> 2. remove the physical replication slot if the standby is using one
> (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
> stop the replication between the nodes and remove the replication setup
> (publications, subscriptions, replication slots). Or even include another
> action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.

1) This Assert can fail if source is shutdown:
+static void
+drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
char *slot_name)
+{
+ PQExpBuffer str = createPQExpBuffer();
+ PGresult *res;
+
+ Assert(conn != NULL);

I could simulate it by shutting the primary while trying to reach the
consistent state:
pg_subscriber: postmaster reached the consistent state
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: error: connection to database failed: connection to
server at "localhost" (127.0.0.1), port 5432 failed: Connection
refused
Is the server running on that host and accepting TCP/IP connections?
pg_subscriber: pg_subscriber.c:692: drop_replication_slot: Assertion
`conn != ((void *)0)' failed.
Aborted

2) Should we have some checks to see if the max replication slot
configuration is ok based on the number of slots that will be created,
we have similar checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

3) Should we check if wal_level is set to logical, we have similar
checks in upgrade replication slots in
check_new_cluster_logical_replication_slots

4) The physical replication slot that was created will still be
present in the primary node, I felt this should be removed.

5) I felt the target server should be started before completion of
pg_subscriber:
+ /*
+ * Stop the subscriber.
+ */
+ pg_log_info("stopping the subscriber");
+
+ pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path,
subscriber_dir);
+ rc = system(pg_ctl_cmd);
+ pg_ctl_status(pg_ctl_cmd, rc, 0);
+
+ /*
+ * Change system identifier.
+ */
+ modify_sysid(pg_resetwal_path, subscriber_dir);
+
+ success = true;
+
+ pg_log_info("Done!");
+
+ return 0;

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-01-01 10:25:16 Update for copyright messages to 2024 (Happy New Year!)
Previous Message Shubham Khanna 2024-01-01 09:03:45 Re: Assorted typo fixes