Re: speed up a logical replica setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2023-12-21 06:16:35
Message-ID: CAA4eK1+yUsZ7SDU-NDfcgdVQM4UOTK_bhLx2z1a0CHkEidvHow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
>
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point? All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work. Say
> >> a replication_tools?
> >
> > Seems like unnecessary churn. Nobody has complained about any of the other
> > tools in there.
>
> Not sure. We rename things across releases in the tree from time to
> time, and here that's straight-forward.
>
>
> Based on this discussion it seems we have a consensus that this tool should be
> in the pg_basebackup directory. (If/when we agree with the directory renaming,
> it could be done in a separate patch.) Besides this move, the v3 provides a dry
> run mode. It basically executes every routine but skip when should do
> modifications. It is an useful option to check if you will be able to run it
> without having issues with connectivity, permission, and existing objects
> (replication slots, publications, subscriptions). Tests were slightly improved.
> Messages were changed to *not* provide INFO messages by default and --verbose
> provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> also refactored the connect_database() function into which the connection will
> always use the logical replication mode. A bug was fixed in the transient
> replication slot name. Ashutosh review [1] was included. The code was also indented.
>
> There are a few suggestions from Ashutosh [2] that I will reply in another
> email.
>
> I'm still planning to work on the following points:
>
> 1. improve the cleanup routine to point out leftover objects if there is any
> connection issue.
>

I think this is an important part. Shall we try to write to some file
the pending objects to be cleaned up? We do something like that during
the upgrade.

> 2. remove the physical replication slot if the standby is using one
> (primary_slot_name).
> 3. provide instructions to promote the logical replica into primary, I mean,
> stop the replication between the nodes and remove the replication setup
> (publications, subscriptions, replication slots). Or even include another
> action to do it. We could add both too.
>
> Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> UI for users that would like to use it.
>

Isn't point 2 also essential because how would otherwise such a slot
be advanced or removed?

A few other points:
==============
1. Previously, I asked whether we need an additional replication slot
patch created to get consistent LSN and I see the following comment in
the patch:

+ *
+ * XXX we should probably use the last created replication slot to get a
+ * consistent LSN but it should be changed after adding pg_basebackup
+ * support.

Yeah, sure, we may want to do that after backup support and we can
keep a comment for the same but I feel as the patch stands today,
there is no good reason to keep it. Also, is there a reason that we
can't create the slots after backup is complete and before we write
recovery parameters

2.
+ appendPQExpBuffer(str,
+ "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
+ "WITH (create_slot = false, copy_data = false, enabled = false)",
+ dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);

Shouldn't we enable two_phase by default for newly created
subscriptions? Is there a reason for not doing so?

3. How about sync slots on the physical standby if present? Do we want
to retain those as it is or do we need to remove those? We are
actively working on the patch [1] for the same.

4. Can we see some numbers with various sizes of databases (cluster)
to see how it impacts the time for small to large-size databases as
compared to the traditional method? This might help us with giving
users advice on when to use this tool. We can do this bit later as
well when the patch is closer to being ready for commit.

[1] - https://www.postgresql.org/message-id/OS0PR01MB5716DAF72265388A2AD424119495A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-12-21 06:23:04 Re: Function to get invalidation cause of a replication slot.
Previous Message Zhijie Hou (Fujitsu) 2023-12-21 05:59:52 RE: Synchronizing slots from primary to standby