Quick Links

Re: LWLock optimization for multicore Power machines

From:	Bernd Helmle <mailings(at)oopsware(dot)de>
To:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: LWLock optimization for multicore Power machines
Date:	2017-02-13 14:16:35
Message-ID:	1486995395.2959.11.camel@oopsware.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov:
> Thus, I see reasons why in your tests absolute results are lower than
> in my
> previous tests.
> 1. You use 28 physical cores while I was using 32 physical cores.
> 2. You run tests in PowerVM while I was running test on bare metal.
> PowerVM could have some overhead.
> 3. I guess you run pgbench on the same machine. While in my tests
> pgbench
> was running on another node of IBM E880.
>

Yeah, pgbench was running locally. Maybe we can get some resources to
run them remotely. Interesting side note: If you run a second postgres
instance with the same pgbench in parallel, you get nearly the same
transaction throughput as a single instance.

Short side note:

If you run two Postgres instances concurrently with the same pgbench
parameters, you get nearly the same transaction throughput for both
instances each as when running against a single instance, e.g.

- single

transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 112
number of threads: 112
duration: 300 s
number of transactions actually processed: 121523797
latency average = 0.276 ms
latency stddev = 0.096 ms
tps = 405075.282309 (including connections establishing)
tps = 405114.299174 (excluding connections establishing)

instance-1/instance-2 concurrently run:

transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 112
number of threads: 56
duration: 300 s
number of transactions actually processed: 120645351
latency average = 0.278 ms
latency stddev = 0.158 ms
tps = 402148.536087 (including connections establishing)
tps = 402199.952824 (excluding connections establishing)

transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 112
number of threads: 56
duration: 300 s
number of transactions actually processed: 121959772
latency average = 0.275 ms
latency stddev = 0.110 ms
tps = 406530.139080 (including connections establishing)
tps = 406556.658638 (excluding connections establishing)

So it looks like the machine has plenty of power, but PostgreSQL is
limiting somewhere.

> Therefore, having lower absolute numbers in your tests, win of LWLock
> optimization is also lower. That is understandable. But win of
> LWLock
> optimization is clearly visible definitely exceeds variation.
>
> I think it would make sense to run more kinds of tests. Could you
> try set
> of tests provided by Tomas Vondra?
> If even we wouldn't see win some of the tests, it would be still
> valuable
> to see that there is no regression there.

Unfortunately there are some test for AIX scheduled, which will assign
resources to that LPAR...i've just talked to the people responsible for
the machine and we can get more time for the Linux tests ;)

In response to

Re: LWLock optimization for multicore Power machines at 2017-02-11 12:42:44 from Alexander Korotkov

Responses

Re: LWLock optimization for multicore Power machines at 2017-02-13 19:17:08 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyle Gearhart	2017-02-13 14:46:26	Re: libpq Alternate Row Processor
Previous Message	Konstantin Knizhnik	2017-02-13 14:12:05	VOPS: vectorized executor for Postgres: how to speedup OLAP queries more than 10 times without changing anything in Postgres executor