Re: WIP/PoC for parallel backup

From: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: David Zhang <david(dot)zhang(at)highgo(dot)ca>, Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-04-29 12:41:07
Message-ID: CAF1DzPWwG_BcxcR5wL17qs=pbZ4iR_ROK_vBZbAKBpn8rM0oWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We at EnterpriseDB did some performance testing around this
parallel backup to check how this is beneficial and below are the results.
In this testing, we run the backup -
1) Without Asif’s patch
2) With Asif’s patch and combination of workers 1,2,4,8.

We run those test on two setup

1) Client and Server both on the same machine (Local backups)

2) Client and server on a different machine (remote backups)

*Machine details: *

1: Server (on which local backups performed and used as server for remote
backups)

2: Client (Used as a client for remote backups)

*Server:*

RAM: 500 GB
CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Filesystem: ext4

*Client:*
RAM: 490 GB
CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
Filesystem: ext4

Below are the results for the local test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 0m27.016s
user 0m3.378s
sys 0m23.059s real 0m30.314s
user 0m3.575s
sys 0m22.946s 12% performance
decreased real 0m20.400s
user 0m3.622s
sys 0m29.670s 27% performace
increased real 0m15.331s
user 0m3.706s
sys 0m39.189s 43% performance
increased real 0m15.094s
user 0m3.915s
sys 1m23.350s 44% performace
increased.
50GB
(50 tables - each table around 1.05 GB) real 2m11.049s
user 0m16.464s
sys 2m1.757s real 2m26.621s
user 0m18.497s
sys 2m4.792s 21% performance
decreased real 1m9.581s
user 0m18.298s
sys 2m12.030s 46% performance
increased real 0m53.894s
user 0m18.588s
sys 2m47.390s 58% performance
increased. real 0m55.373s
user 0m18.423s
sys 5m57.470s 57% performance
increased.
100GB
(100 tables - each table around 1.05 GB) real 4m4.776s
user 0m33.699s
sys 3m27.777s real 4m20.862s
user 0m35.753s
sys 3m28.262s 6% performance
decreased real 2m37.411s
user 0m36.440s
sys 4m16.424s" 35% performance
increased real 1m49.503s
user 0m37.200s
sys 5m58.077s 55% performace
increased real 1m36.762s
user 0m36.987s
sys 9m36.906s 60% performace
increased.
200GB
(200 tables - each table around 1.05 GB) real 10m34.998s
user 1m8.471s
sys 7m21.520s real 11m30.899s
user 1m12.933s
sys 8m14.496s 8% performance
decreased real 6m8.481s
user 1m13.771s
sys 9m31.216s 41% performance
increased real 4m2.403s
user 1m18.331s
sys 12m29.661s 61% performance
increased real 4m3.768s
user 1m24.547s
sys 15m21.421s 61% performance
increased

Results for the remote test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 1m36.829s
user 0m2.124s
sys 0m14.004s real 1m37.598s
user 0m3.272s
sys 0m11.110s 0.8% performance
decreased real 1m36.753s
user 0m2.627s
sys 0m15.312s 0.08% performance
increased. real 1m37.212s
user 0m3.835s
sys 0m13.221s 0.3% performance
decreased. real 1m36.977s
user 0m4.475s
sys 0m17.937s 0.1% perfomance
decreased.
50GB
(50 tables - each table around 1.05 GB) real 7m54.211s
user 0m10.826s
sys 1m10.435s real 7m55.603s
user 0m16.535s
sys 1m8.147s 0.2% performance
decreased real 7m53.499s
user 0m18.131s
sys 1m8.822s 0.1% performance
increased. real 7m54.687s
user 0m15.818s
sys 1m30.991s 0.1% performance
decreased real 7m54.658s
user 0m20.783s
sys 1m34.460s 0.1% performance
decreased
100GB
(100 tables - each table around 1.05 GB) real 15m45.776s
user 0m21.802s
sys 2m59.006s real 15m46.315s
user 0m32.499s
sys 2m47.245s 0.05% performance
decreased real 15m46.065s
user 0m28.877s
sys 2m21.181s 0.03% performacne
drcreased real 15m47.793s
user 0m30.932s
sys 2m36.708s 0.2% performance
decresed real 15m47.129s
user 0m35.151s
sys 3m23.572s 0.14% performance
decreased.
200GB
(200 tables - each table around 1.05 GB) real 32m55.720s
user 0m50.602s
sys 5m38.875s real 31m30.602s
user 0m45.377s
sys 4m57.405s 4% performance
increased real 31m30.214s
user 0m55.023s
sys 5m8.689s 4% performance
increased real 31m31.187s
user 1m13.390s
sys 5m40.861s 4% performance
increased real 31m31.729s
user 1m4.955s
sys 6m35.774s 4% performance
decreased

Client & Server on the same machine, the result shows around 50%
improvement in parallel run with worker 4 and 8. We don’t see the huge
performance improvement with more workers been added.

Whereas, when the client and server on a different machine, we don’t see
any major benefit in performance. This testing result matches the testing
results posted by David Zhang up thread.

We ran the test for 100GB backup with parallel worker 4 to see the CPU
usage and other information. What we noticed is that server is consuming
the CPU almost 100% whole the time and pg_stat_activity shows that server
is busy with ClientWrite most of the time.

Attaching captured output for

1) Top command output on the server after every 5 second

2) pg_stat_activity output after every 5 second

3) Top command output on the client after every 5 second

Do let me know if anyone has further questions/inputs for the benchmarking.

Thanks to Rushabh Lathia for helping me with this testing.

On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david(dot)zhang(at)highgo(dot)ca>
> wrote:
> >
> > Hi,
> >
> > Here is the parallel backup performance test results with and without
> > the patch "parallel_backup_v15" on AWS cloud environment. Two
> > "t2.xlarge" machines were used: one for Postgres server and the other
> > one for pg_basebackup with the same machine configuration showing below.
> >
> > Machine configuration:
> > Instance Type :t2.xlarge
> > Volume type :io1
> > Memory (MiB) :16GB
> > vCPU # :4
> > Architecture :x86_64
> > IOP :6000
> > Database Size (GB) :108
> >
> > Performance test results:
> > without patch:
> > real 18m49.346s
> > user 1m24.178s
> > sys 7m2.966s
> >
> > 1 worker with patch:
> > real 18m43.201s
> > user 1m55.787s
> > sys 7m24.724s
> >
> > 2 worker with patch:
> > real 18m47.373s
> > user 2m22.970s
> > sys 11m23.891s
> >
> > 4 worker with patch:
> > real 18m46.878s
> > user 2m26.791s
> > sys 13m14.716s
> >
> > As required, I didn't have the pgbench running in parallel like we did
> > in the previous benchmark.
> >
>
> So, there doesn't seem to be any significant improvement in this
> scenario. Now, it is not clear why there was a significant
> improvement in the previous run where pgbench was also running
> simultaneously. I am not sure but maybe it is because when a lot of
> other backends were running (performing read-only workload) the
> backend that was responsible for doing backup was getting frequently
> scheduled out and it slowed down the overall backup process. And when
> we start using multiple backends for backup one or other backup
> process is always running making the overall backup faster. One idea
> to find this out is to check how much time backup takes when we run it
> with and without pgbench workload on HEAD (aka unpatched code). Even
> if what I am saying is true or there is some other reason due to which
> we are seeing speedup in some cases (where there is a concurrent
> workload), it might not make the case for using multiple backends for
> backup but still, it is good to find that information as it might help
> in designing this feature better.
>
> > The perf report files for both Postgres server and pg_basebackup sides
> > are attached.
> >
>
> It is not clear which functions are taking more time or for which
> functions time is reduced as function symbols are not present in the
> reports. I think you can refer
> "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
> take profiles and additionally use -fno-omit-frame-pointer during
> configure (you can use CFLAGS="-fno-omit-frame-pointer during
> configure).
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

Attachment Content-Type Size
top_server_5_100GB.txt text/plain 195.1 KB
top_client.txt text/plain 199.6 KB
pg_stat_activity_5_100GB.txt text/plain 951.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-04-29 13:27:54 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message Amit Kapila 2020-04-29 12:32:57 Re: PG compilation error with Visual Studio 2015/2017/2019