From: | David Zhang <david(dot)zhang(at)highgo(dot)ca> |
---|---|
To: | Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP/PoC for parallel backup |
Date: | 2020-04-30 06:26:16 |
Message-ID: | 52a86c49-ff6c-3825-1d25-20596907b7a7@highgo.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Thanks a lot for sharing the test results. Here is the our test results
using perf on three ASW t2.xlarge with below configuration.
Machine configuration:
Instance Type :t2.xlarge
Volume type :io1
Memory (MiB) :16GB
vCPU # :4
Architecture :x86_64
IOP :6000
Database Size (GB) :45 (Server)
case 1: postgres server: without patch and without load
* Disk I/O:
# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.65% postgres [kernel.kallsyms] [k] __elv_add_request
2.27% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead Command Shared Object Symbol
# ........ ............... ....................
.............................................
#
64.73% swapper [kernel.kallsyms] [k] native_safe_halt
10.89% postgres [vdso] [.] __vdso_gettimeofday
5.64% postgres [kernel.kallsyms] [k] do_syscall_64
5.43% postgres libpthread-2.26.so [.] __libc_recv
1.72% postgres [kernel.kallsyms] [k]
pvclock_clocksource_read
* Network:
# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
91.58% swapper [kernel.kallsyms] [k] consume_skb
7.09% postgres [kernel.kallsyms] [k] consume_skb
0.61% kswapd0 [kernel.kallsyms] [k] consume_skb
0.44% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
case 1: pg_basebackup client: without patch and without load
* Disk I/O:
# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
96.78% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
2.82% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.29% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.09% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead Command Shared Object Symbol
# ........ ............... ..................
.............................................
#
87.99% swapper [kernel.kallsyms] [k] native_safe_halt
3.14% swapper [kernel.kallsyms] [k] __lock_text_start
0.48% swapper [kernel.kallsyms] [k]
__softirqentry_text_start
0.37% pg_basebackup [kernel.kallsyms] [k]
copy_user_enhanced_fast_string
0.35% swapper [kernel.kallsyms] [k] do_csum
* Network:
# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
95.12% swapper [kernel.kallsyms] [k] consume_skb
3.23% pg_basebackup [kernel.kallsyms] [k] consume_skb
0.83% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.45% kswapd0 [kernel.kallsyms] [k] consume_skb
case 2: postgres server: with patch and with load, 4 backup workers on
client side
* Disk I/O:
# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
98.88% postgres [kernel.kallsyms] [k] __elv_add_request
0.66% perf [kernel.kallsyms] [k] __elv_add_request
0.42% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.01% sshd [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead Command Shared Object Symbol
# ........ ............... .....................
.............................................
#
52.73% swapper [kernel.kallsyms] [k] native_safe_halt
8.31% postgres [vdso] [.] __vdso_gettimeofday
4.46% postgres [kernel.kallsyms] [k] do_syscall_64
4.16% postgres libpthread-2.26.so [.] __libc_recv
1.58% postgres [kernel.kallsyms] [k] __lock_text_start
1.52% postgres [kernel.kallsyms] [k]
pvclock_clocksource_read
0.81% postgres [kernel.kallsyms] [k]
copy_user_enhanced_fast_string
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
85.81% postgres [kernel.kallsyms] [k] consume_skb
12.03% swapper [kernel.kallsyms] [k] consume_skb
0.97% postgres [kernel.kallsyms] [k] __consume_stateless_skb
0.85% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
0.24% perf [kernel.kallsyms] [k] consume_skb
case 2: pg_basebackup 4 workers: with patch and with load
* Disk I/O:
# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.26% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
1.45% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.95% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.14% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................
.................................................
#
89.25% swapper [kernel.kallsyms] [k] native_safe_halt
0.93% pg_basebackup [kernel.kallsyms] [k]
__lock_text_start
0.91% swapper [kernel.kallsyms] [k]
__lock_text_start
0.69% pg_basebackup [kernel.kallsyms] [k]
copy_user_enhanced_fast_string
0.45% swapper [kernel.kallsyms] [k] do_csum
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
90.28% pg_basebackup [kernel.kallsyms] [k] consume_skb
9.09% swapper [kernel.kallsyms] [k] consume_skb
0.29% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.21% sshd [kernel.kallsyms] [k] consume_skb
The detailed perf report is attached, with different scenarios, i.e.
without patch (with and without load for server and client) , with patch
(with and without load for 1, 2, 4, 8 workers for both server and
client). The file name should self explain the cases.
Let me know if more information required.
Best regards,
David
On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
> Hi,
>
> We at EnterpriseDB did some performance testing around this
> parallel backup to check how this is beneficial and below are the
> results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> *Machine details: *
>
> 1: Server (on which local backups performed and used as server for
> remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
> *Server:*
>
> RAM:500 GB
> CPU details:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 128
> On-line CPU(s) list: 0-127
> Thread(s) per core: 2
> Core(s) per socket: 8
> Socket(s): 8
> NUMA node(s): 8
> Filesystem:ext4
>
>
> *Client:*
> RAM:490 GB
> CPU details:
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 192
> On-line CPU(s) list: 0-191
> Thread(s) per core: 8
> Core(s) per socket: 1
> Socket(s): 24
> Filesystem:ext4
>
> Below are the results for the local test:
>
> Data size without paralle backup
> patch parallel backup with
> 1 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 2 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 4 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 8 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch)
> 10 GB
> (10 tables - each table around 1.05 GB) real 0m27.016s
> user 0m3.378s
> sys 0m23.059s real 0m30.314s
> user 0m3.575s
> sys 0m22.946s 12% performance
> decreased real 0m20.400s
> user 0m3.622s
> sys 0m29.670s 27% performace
> increased real 0m15.331s
> user 0m3.706s
> sys 0m39.189s 43% performance
> increased real 0m15.094s
> user 0m3.915s
> sys 1m23.350s 44% performace
> increased.
> 50GB
> (50 tables - each table around 1.05 GB) real 2m11.049s
> user 0m16.464s
> sys 2m1.757s real 2m26.621s
> user 0m18.497s
> sys 2m4.792s 21% performance
> decreased real 1m9.581s
> user 0m18.298s
> sys 2m12.030s 46% performance
> increased real 0m53.894s
> user 0m18.588s
> sys 2m47.390s 58% performance
> increased. real 0m55.373s
> user 0m18.423s
> sys 5m57.470s 57% performance
> increased.
> 100GB
> (100 tables - each table around 1.05 GB) real 4m4.776s
> user 0m33.699s
> sys 3m27.777s real 4m20.862s
> user 0m35.753s
> sys 3m28.262s 6% performance
> decreased real 2m37.411s
> user 0m36.440s
> sys 4m16.424s" 35% performance
> increased real 1m49.503s
> user 0m37.200s
> sys 5m58.077s 55% performace
> increased real 1m36.762s
> user 0m36.987s
> sys 9m36.906s 60% performace
> increased.
> 200GB
> (200 tables - each table around 1.05 GB) real 10m34.998s
> user 1m8.471s
> sys 7m21.520s real 11m30.899s
> user 1m12.933s
> sys 8m14.496s 8% performance
> decreased real 6m8.481s
> user 1m13.771s
> sys 9m31.216s 41% performance
> increased real 4m2.403s
> user 1m18.331s
> sys 12m29.661s 61% performance
> increased real 4m3.768s
> user 1m24.547s
> sys 15m21.421s 61% performance
> increased
>
>
> Results for the remote test:
>
> Data size without paralle backup
> patch parallel backup with
> 1 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 2 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 4 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 8 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch)
> 10 GB
> (10 tables - each table around 1.05 GB) real 1m36.829s
> user 0m2.124s
> sys 0m14.004s real 1m37.598s
> user 0m3.272s
> sys 0m11.110s 0.8% performance
> decreased real 1m36.753s
> user 0m2.627s
> sys 0m15.312s 0.08% performance
> increased. real 1m37.212s
> user 0m3.835s
> sys 0m13.221s 0.3% performance
> decreased. real 1m36.977s
> user 0m4.475s
> sys 0m17.937s 0.1% perfomance
> decreased.
> 50GB
> (50 tables - each table around 1.05 GB) real 7m54.211s
> user 0m10.826s
> sys 1m10.435s real 7m55.603s
> user 0m16.535s
> sys 1m8.147s 0.2% performance
> decreased real 7m53.499s
> user 0m18.131s
> sys 1m8.822s 0.1% performance
> increased. real 7m54.687s
> user 0m15.818s
> sys 1m30.991s 0.1% performance
> decreased real 7m54.658s
> user 0m20.783s
> sys 1m34.460s 0.1% performance
> decreased
> 100GB
> (100 tables - each table around 1.05 GB) real 15m45.776s
> user 0m21.802s
> sys 2m59.006s real 15m46.315s
> user 0m32.499s
> sys 2m47.245s 0.05% performance
> decreased real 15m46.065s
> user 0m28.877s
> sys 2m21.181s 0.03% performacne
> drcreased real 15m47.793s
> user 0m30.932s
> sys 2m36.708s 0.2% performance
> decresed real 15m47.129s
> user 0m35.151s
> sys 3m23.572s 0.14% performance
> decreased.
> 200GB
> (200 tables - each table around 1.05 GB) real 32m55.720s
> user 0m50.602s
> sys 5m38.875s real 31m30.602s
> user 0m45.377s
> sys 4m57.405s 4% performance
> increased real 31m30.214s
> user 0m55.023s
> sys 5m8.689s 4% performance
> increased real 31m31.187s
> user 1m13.390s
> sys 5m40.861s 4% performance
> increased real 31m31.729s
> user 1m4.955s
> sys 6m35.774s 4% performance
> decreased
>
>
>
> Client & Server on the same machine, the result shows around 50%
> improvement in parallel run with worker 4 and 8. We don’t see the
> huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t
> see any major benefit in performance. This testing result matches the
> testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU
> usage and other information. What we noticed is that server is
> consuming the CPU almost 100% whole the time and pg_stat_activity
> shows that server is busy with ClientWrite most of the time.
>
>
> Attaching captured output for
>
> 1) Top command output on the server after every 5 second
>
> 2) pg_stat_activity output after every 5 second
>
> 3) Top command output on the client after every 5 second
>
>
> Do let me know if anyone has further questions/inputs for the
> benchmarking.
>
>
> Thanks to Rushabh Lathia for helping me with this testing.
>
> On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com
> <mailto:amit(dot)kapila16(at)gmail(dot)com>> wrote:
>
> On Mon, Apr 27, 2020 at 10:23 PM David Zhang
> <david(dot)zhang(at)highgo(dot)ca <mailto:david(dot)zhang(at)highgo(dot)ca>> wrote:
> >
> > Hi,
> >
> > Here is the parallel backup performance test results with and
> without
> > the patch "parallel_backup_v15" on AWS cloud environment. Two
> > "t2.xlarge" machines were used: one for Postgres server and the
> other
> > one for pg_basebackup with the same machine configuration
> showing below.
> >
> > Machine configuration:
> > Instance Type :t2.xlarge
> > Volume type :io1
> > Memory (MiB) :16GB
> > vCPU # :4
> > Architecture :x86_64
> > IOP :6000
> > Database Size (GB) :108
> >
> > Performance test results:
> > without patch:
> > real 18m49.346s
> > user 1m24.178s
> > sys 7m2.966s
> >
> > 1 worker with patch:
> > real 18m43.201s
> > user 1m55.787s
> > sys 7m24.724s
> >
> > 2 worker with patch:
> > real 18m47.373s
> > user 2m22.970s
> > sys 11m23.891s
> >
> > 4 worker with patch:
> > real 18m46.878s
> > user 2m26.791s
> > sys 13m14.716s
> >
> > As required, I didn't have the pgbench running in parallel like
> we did
> > in the previous benchmark.
> >
>
> So, there doesn't seem to be any significant improvement in this
> scenario. Now, it is not clear why there was a significant
> improvement in the previous run where pgbench was also running
> simultaneously. I am not sure but maybe it is because when a lot of
> other backends were running (performing read-only workload) the
> backend that was responsible for doing backup was getting frequently
> scheduled out and it slowed down the overall backup process. And when
> we start using multiple backends for backup one or other backup
> process is always running making the overall backup faster. One idea
> to find this out is to check how much time backup takes when we run it
> with and without pgbench workload on HEAD (aka unpatched code). Even
> if what I am saying is true or there is some other reason due to which
> we are seeing speedup in some cases (where there is a concurrent
> workload), it might not make the case for using multiple backends for
> backup but still, it is good to find that information as it might help
> in designing this feature better.
>
> > The perf report files for both Postgres server and pg_basebackup
> sides
> > are attached.
> >
>
> It is not clear which functions are taking more time or for which
> functions time is reduced as function symbols are not present in the
> reports. I think you can refer
> "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
> take profiles and additionally use -fno-omit-frame-pointer during
> configure (you can use CFLAGS="-fno-omit-frame-pointer during
> configure).
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
--
David
Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
Attachment | Content-Type | Size |
---|---|---|
perf_report.tar.gz | application/x-gzip | 939.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2020-04-30 06:38:04 | Re: Fixes for two separate bugs in nbtree VACUUM's page deletion |
Previous Message | Fabien COELHO | 2020-04-30 06:00:07 | Re: Proposing WITH ITERATIVE |