From: | Bernd Helmle <mailings(at)oopsware(dot)de> |
---|---|
To: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: LWLock optimization for multicore Power machines |
Date: | 2017-02-08 14:00:36 |
Message-ID: | 1486562436.3288.17.camel@oopsware.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Am Dienstag, den 07.02.2017, 16:48 +0300 schrieb Alexander Korotkov:
> But win isn't
> as high as I observed earlier. And I wonder why absolute numbers are
> lower
> than in our earlier experiments. We used IBM E880 which is actually
> two
Did you run your tests on bare metal or were they also virtualized?
> nodes with interconnect. However interconnect is not fast enough to
> make
> one PostgreSQL instance work on both nodes. Thus, used half of IBM
> E880
> which has 4 sockets and 32 physical cores. While you use IBM E850
> which is
> really single node with 4 sockets and 48 physical cores. Thus, it
> seems
> that you have lower absolute numbers on more powerful hardware. That
> makes
> me uneasy and I think we probably don't get the best from this
> hardware.
> Just in case, do you use SMT=8?
Yes, SMT=8 was used.
The machine has 4 sockets, 8 Core each, 3.7 GHz clock frequency. The
Ubuntu LPAR running on PowerVM isn't using all physical cores,
currently 28 cores are assigned (=224 SMT Threads). The other cores are
dedicated to the PowerVM hypervisor and a (very) small AIX LPAR.
I've done more pgbenches this morning with SMT-4, too, fastest result
with master was
SMT-4
transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 167306423
latency average = 0.179 ms
latency stddev = 0.072 ms
tps = 557685.144912 (including connections establishing)
tps = 557835.683204 (excluding connections establishing)
compared with SMT-8:
transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 173476449
latency average = 0.173 ms
latency stddev = 0.059 ms
tps = 578250.676019 (including connections establishing)
tps = 578412.159601 (excluding connections establishing)
and retried with lwlocks-power-3, SMT-4:
transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 185991995
latency average = 0.161 ms
latency stddev = 0.059 ms
tps = 619970.030069 (including connections establishing)
tps = 620112.263770 (excluding connections establishing)
credativ(at)iicl183:~/git/postgres$
...and SMT-8
transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 185878717
latency average = 0.161 ms
latency stddev = 0.047 ms
tps = 619591.476154 (including connections establishing)
tps = 619655.867280 (excluding connections establishing)
Interestingly the lwlocks patch seems to decrease the SMT influence
factor.
Side note: the system makes around 2 Mio Context Switches during the
benchmarks, e.g.
awk '{print $12;}' /tmp/vmstat.out
cs
10
2153533
2134864
2141623
2126845
2128330
2127454
2145325
2126769
2134492
2130246
2130071
2142660
2136077
2126783
2126107
2125823
2136511
2137752
2146307
2141127
I've also tried a more recent kernel this morning (4.4 vs. 4.8), but
this didn't change the picture. Is there anything more i can do?
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-02-08 14:16:57 | Re: drop support for Python 2.3 |
Previous Message | Dilip Kumar | 2017-02-08 13:58:28 | Re: Parallel bitmap heap scan |