Re: LWLock optimization for multicore Power machines

From: Bernd Helmle <mailings(at)oopsware(dot)de>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LWLock optimization for multicore Power machines
Date: 2017-02-08 14:00:36
Message-ID: 1486562436.3288.17.camel@oopsware.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Am Dienstag, den 07.02.2017, 16:48 +0300 schrieb Alexander Korotkov:
> But win isn't
> as high as I observed earlier.  And I wonder why absolute numbers are
> lower
> than in our earlier experiments.  We used IBM E880 which is actually
> two

Did you run your tests on bare metal or were they also virtualized?

> nodes with interconnect.  However interconnect is not fast enough to
> make
> one PostgreSQL instance work on both nodes.  Thus, used half of IBM
> E880
> which has 4 sockets and 32 physical cores.  While you use IBM E850
> which is
> really single node with 4 sockets and 48 physical cores.  Thus, it
> seems
> that you have lower absolute numbers on more powerful hardware.  That
> makes
> me uneasy and I think we probably don't get the best from this
> hardware.
> Just in case, do you use SMT=8?

Yes, SMT=8 was used.

The machine has 4 sockets, 8 Core each, 3.7 GHz clock frequency. The
Ubuntu LPAR running on PowerVM isn't using all physical cores,
currently 28 cores are assigned (=224 SMT Threads). The other cores are
dedicated to the PowerVM hypervisor and a (very) small AIX LPAR.

I've done more pgbenches this morning with SMT-4, too, fastest result
with master was

SMT-4

transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 167306423
latency average = 0.179 ms
latency stddev = 0.072 ms
tps = 557685.144912 (including connections establishing)
tps = 557835.683204 (excluding connections establishing)

compared with SMT-8:

transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 173476449
latency average = 0.173 ms
latency stddev = 0.059 ms
tps = 578250.676019 (including connections establishing)
tps = 578412.159601 (excluding connections establishing)

and retried with lwlocks-power-3, SMT-4:

transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 185991995
latency average = 0.161 ms
latency stddev = 0.059 ms
tps = 619970.030069 (including connections establishing)
tps = 620112.263770 (excluding connections establishing)
credativ(at)iicl183:~/git/postgres$ 

...and SMT-8

transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 100
number of threads: 100
duration: 300 s
number of transactions actually processed: 185878717
latency average = 0.161 ms
latency stddev = 0.047 ms
tps = 619591.476154 (including connections establishing)
tps = 619655.867280 (excluding connections establishing)

Interestingly the lwlocks patch seems to decrease the SMT influence
factor.

Side note: the system makes around 2 Mio Context Switches during the
benchmarks, e.g.

awk '{print $12;}' /tmp/vmstat.out 

cs
10
2153533
2134864
2141623
2126845
2128330
2127454
2145325
2126769
2134492
2130246
2130071
2142660
2136077
2126783
2126107
2125823
2136511
2137752
2146307
2141127

I've also tried a more recent kernel this morning (4.4 vs. 4.8), but
this didn't change the picture. Is there anything more i can do?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-02-08 14:16:57 Re: drop support for Python 2.3
Previous Message Dilip Kumar 2017-02-08 13:58:28 Re: Parallel bitmap heap scan