Re: Speed up Clog Access by increasing CLOG buffers

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-12-22 13:29:13
Message-ID: 84c22fbb-b9c4-a02f-384b-b4feb2c67193@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> The attached results show that:
>
> (a) master shows the same zig-zag behavior - No idea why this wasn't
> observed on the previous runs.
>
> (b) group_update actually seems to improve the situation, because the
> performance keeps stable up to 72 clients, while on master the
> fluctuation starts way earlier.
>
> I'll redo the tests with a newer kernel - this was on 3.10.x which is
> what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches
> you submitted, if the 4.8.6 kernel does not help.
>
> Overall, I'm convinced this issue is unrelated to the patches.

I've been unable to rerun the tests on this hardware with a newer
kernel, so nothing new on the x86 front.

But as discussed with Amit in Tokyo at pgconf.asia, I got access to a
Power8e machine (IBM 8247-22L to be precise). It's a much smaller
machine compared to the x86 one, though - it only has 24 cores in 2
sockets, 128GB of RAM and less powerful storage, for example.

I've repeated a subset of x86 tests and pushed them to

https://bitbucket.org/tvondra/power8-results-2

The new results are prefixed with "power-" and I've tried to put them
right next to the "same" x86 tests.

In all cases the patches significantly reduce the contention on
CLogControlLock, just like on x86. Which is good and expected.

Otherwise the results are rather boring - no major regressions compared
to master, and all the patches perform almost exactly the same. Compare
for example this:

* http://tvondra.bitbucket.org/#dilip-300-unlogged-sync

* http://tvondra.bitbucket.org/#power-dilip-300-unlogged-sync

So the results seem much smoother compared to x86, and the performance
difference is roughly 3x, which matches the 24 vs. 72 cores.

For pgbench, the difference is much more significant, though:

* http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip

* http://tvondra.bitbucket.org/#power-pgbench-300-unlogged-sync-skip

So, we're doing ~40k on Power8, but 220k on x86 (which is ~6x more, so
double per-core throughput). My first guess was that this is due to the
x86 machine having better I/O subsystem, so I've reran the tests with
data directory in tmpfs, but that produced almost the same results.

Of course, this observation is unrelated to this patch.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2016-12-22 14:10:32 Re: Minor correction in alter_table.sgml
Previous Message amul sul 2016-12-22 10:20:43 Re: pg_background contrib module proposal