Re: speed up a logical replica setup

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-01-19 08:48:53
Message-ID: CAHv8RjKYo1Xdkj6WhFZ32Mg4-+9EuNgau=JZH8U7_NPv2gFmGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 16, 2024 at 11:58 AM Shubham Khanna
<khannashubham1197(at)gmail(dot)com> wrote:
>
> On Thu, Dec 21, 2023 at 11:47 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> > >
> > > On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
> > >
> > > On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > > > On 08.11.23 00:12, Michael Paquier wrote:
> > > >> - Should the subdirectory pg_basebackup be renamed into something more
> > > >> generic at this point? All these things are frontend tools that deal
> > > >> in some way with the replication protocol to do their work. Say
> > > >> a replication_tools?
> > > >
> > > > Seems like unnecessary churn. Nobody has complained about any of the other
> > > > tools in there.
> > >
> > > Not sure. We rename things across releases in the tree from time to
> > > time, and here that's straight-forward.
> > >
> > >
> > > Based on this discussion it seems we have a consensus that this tool should be
> > > in the pg_basebackup directory. (If/when we agree with the directory renaming,
> > > it could be done in a separate patch.) Besides this move, the v3 provides a dry
> > > run mode. It basically executes every routine but skip when should do
> > > modifications. It is an useful option to check if you will be able to run it
> > > without having issues with connectivity, permission, and existing objects
> > > (replication slots, publications, subscriptions). Tests were slightly improved.
> > > Messages were changed to *not* provide INFO messages by default and --verbose
> > > provides INFO messages and --verbose --verbose also provides DEBUG messages. I
> > > also refactored the connect_database() function into which the connection will
> > > always use the logical replication mode. A bug was fixed in the transient
> > > replication slot name. Ashutosh review [1] was included. The code was also indented.
> > >
> > > There are a few suggestions from Ashutosh [2] that I will reply in another
> > > email.
> > >
> > > I'm still planning to work on the following points:
> > >
> > > 1. improve the cleanup routine to point out leftover objects if there is any
> > > connection issue.
> > >
> >
> > I think this is an important part. Shall we try to write to some file
> > the pending objects to be cleaned up? We do something like that during
> > the upgrade.
> >
> > > 2. remove the physical replication slot if the standby is using one
> > > (primary_slot_name).
> > > 3. provide instructions to promote the logical replica into primary, I mean,
> > > stop the replication between the nodes and remove the replication setup
> > > (publications, subscriptions, replication slots). Or even include another
> > > action to do it. We could add both too.
> > >
> > > Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
> > > UI for users that would like to use it.
> > >
> >
> > Isn't point 2 also essential because how would otherwise such a slot
> > be advanced or removed?
> >
> > A few other points:
> > ==============
> > 1. Previously, I asked whether we need an additional replication slot
> > patch created to get consistent LSN and I see the following comment in
> > the patch:
> >
> > + *
> > + * XXX we should probably use the last created replication slot to get a
> > + * consistent LSN but it should be changed after adding pg_basebackup
> > + * support.
> >
> > Yeah, sure, we may want to do that after backup support and we can
> > keep a comment for the same but I feel as the patch stands today,
> > there is no good reason to keep it. Also, is there a reason that we
> > can't create the slots after backup is complete and before we write
> > recovery parameters
> >
> > 2.
> > + appendPQExpBuffer(str,
> > + "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> > + "WITH (create_slot = false, copy_data = false, enabled = false)",
> > + dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
> >
> > Shouldn't we enable two_phase by default for newly created
> > subscriptions? Is there a reason for not doing so?
> >
> > 3. How about sync slots on the physical standby if present? Do we want
> > to retain those as it is or do we need to remove those? We are
> > actively working on the patch [1] for the same.
> >
> > 4. Can we see some numbers with various sizes of databases (cluster)
> > to see how it impacts the time for small to large-size databases as
> > compared to the traditional method? This might help us with giving
> > users advice on when to use this tool. We can do this bit later as
> > well when the patch is closer to being ready for commit.
>
> I have done the Performance testing and attached the results to
> compare the 'Execution Time' between 'logical replication' and
> 'pg_subscriber' for 100MB, 1GB and 5GB data:
> | 100MB | 1GB | 5GB
> Logical rep (2 w) | 1.815s | 14.895s | 75.541s
> Logical rep (4 w) | 1.194s | 9.484s | 46.938s
> Logical rep (8 w) | 0.828s | 6.422s | 31.704s
> Logical rep(10 w)| 0.646s | 3.843s | 18.425s
> pg_subscriber | 3.977s | 9.988s | 12.665s
>
> Here, 'w' stands for 'workers'. I have included the tests to see the
> test result variations with different values for
> 'max_sync_workers_per_subscription' ranging from 2 to 10. I ran the
> tests for different data records; for 100MB I put 3,00,000 Records,
> for 1GB I put 30,00,000 Records and for 5GB I put 1,50,00,000 Records.
> It is observed that 'pg_subscriber' is better when the table size is
> more.
> Next I plan to run these tests for 10GB and 20GB to see if this trend
> continues or not.

I have done the Performance testing and attached the results to
compare the 'Execution Time' between 'logical replication' and
'pg_subscriber' for 10GB and 20GB data:
| 10GB | 20GB
Logical rep (2 w) | 157.131s| 343.191s
Logical rep (4 w) | 116.627s| 240.480s
Logical rep (8 w) | 95.237s | 275.715s
Logical rep(10 w)| 92.792s | 280.538s
pg_subscriber | 22.734s | 25.661s

As expected, we can see that pg_subscriber is very much better in
ideal cases with approximately 7x times better in case of 10GB and 13x
times better in case of 20GB.
I'm attaching the script files which have the details of the test
scripts used and the excel file has the test run details. The
'pg_subscriber.pl' file is for 'Streaming Replication' and the
'logical_replication.pl' file is for 'Logical Replication'.
Note: For 20GB the record count should be changed to 6,00,00,000 and
the 'max_sync_workers_per_subscription' needs to be adjusted for
different logical replication tests with different workers.

Thanks and Regards,
Shubham Khanna.

Attachment Content-Type Size
logical_replication.pl application/octet-stream 2.8 KB
pg_subscriber.pl application/octet-stream 3.3 KB
time_stamp(comparison).xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-01-19 09:03:01 Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Previous Message Tatsuo Ishii 2024-01-19 08:46:03 Re: pgbnech: allow to cancel queries during benchmark