Re: Speed up Clog Access by increasing CLOG buffers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-03-24 13:16:31
Message-ID: CAA4eK1KoGTUTWH=X3yqWAEqfHt0mKrBCMynY_sEoE4fEzPAfgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 24, 2016 at 8:08 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
>
> On Thu, Mar 24, 2016 at 5:40 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Have you, in your evaluation of the performance of this patch, done
> > profiles over time? I.e. whether the performance benefits are the
> > immediately, or only after a significant amount of test time? Comparing
> > TPS over time, for both patched/unpatched looks relevant.
> >
>
> I have mainly done it with half-hour read-write tests. What do you want
to observe via smaller tests, sometimes it gives inconsistent data for
read-write tests?
>

I have done some tests on both intel and power m/c (configuration of which
are mentioned at end-of-mail) to see the results at different
time-intervals and it is always showing greater than 50% improvement in
power m/c at 128 client-count and greater than 29% improvement in Intel m/c
at 88 client-count.

Non-default parameters
------------------------------------
max_connections = 300
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
checkpoint_timeout =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 256MB

pgbench setup
------------------------
scale factor - 300
used *unlogged* tables : pgbench -i --unlogged-tables -s 300 ..
pgbench -M prepared tpc-b

Results on Intel m/c
--------------------------------
client-count - 88

Time (minutes) Base Patch %
5 39978 51858 29.71
10 38169 52195 36.74
20 36992 52173 41.03
30 37042 52149 40.78

Results on power m/c
-----------------------------------
Client-count - 128

Time (minutes) Base Patch %
5 42479 65655 54.55
10 41876 66050 57.72
20 38099 65200 71.13
30 37838 61908 63.61
>
> >
> > Even after changing to scale 500, the performance benefits on this,
> > older 2 socket, machine were minor; even though contention on the
> > ClogControlLock was the second most severe (after ProcArrayLock).
> >
>
> I have tried this patch on mainly 8 socket machine with 300 & 1000 scale
factor. I am hoping that you have tried this test on unlogged tables and
by the way at what client count, you have seen these results.
>

Do you think in your tests, we don't see increase in performance in your
tests because of m/c difference (sockets/cpu cores) or client-count?

Intel m/c config (lscpu)
-------------------------------------
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
Stepping: 2
CPU MHz: 1064.000
BogoMIPS: 4266.62
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
NUMA node0 CPU(s): 0,65-71,96-103
NUMA node1 CPU(s): 72-79,104-111
NUMA node2 CPU(s): 80-87,112-119
NUMA node3 CPU(s): 88-95,120-127
NUMA node4 CPU(s): 1-8,33-40
NUMA node5 CPU(s): 9-16,41-48
NUMA node6 CPU(s): 17-24,49-56
NUMA node7 CPU(s): 25-32,57-64

Power m/c config (lscpu)
-------------------------------------
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2016-03-24 13:29:01 Re: Support for N synchronous standby servers - take 2
Previous Message Magnus Hagander 2016-03-24 13:04:22 Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)