Re: Speed up Clog Access by increasing CLOG buffers

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-09-22 23:44:30
Message-ID: 26b69fb2-fa4d-530c-7783-1cb9d952c4e5@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/21/2016 08:04 AM, Amit Kapila wrote:
> On Wed, Sep 21, 2016 at 3:48 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
...
>
>> I'll repeat the test on the 4-socket machine with a newer kernel,
>> but that's probably the last benchmark I'll do for this patch for
>> now.
>>

Attached are results from benchmarks running on kernel 4.5 (instead of
the old 3.2.80). I've only done synchronous_commit=on, and I've added a
few client counts (mostly at the lower end). The data are pushed the
data to the git repository, see

git push --set-upstream origin master

The summary looks like this (showing both the 3.2.80 and 4.5.5 results):

1) Dilip's workload

3.2.80 16 32 64 128 192
-------------------------------------------------------------------
master 26138 37790 38492 13653 8337
granular-locking 25661 38586 40692 14535 8311
no-content-lock 25653 39059 41169 14370 8373
group-update 26472 39170 42126 18923 8366

4.5.5 1 8 16 32 64 128 192
-------------------------------------------------------------------
granular-locking 4050 23048 27969 32076 34874 36555 37710
no-content-lock 4025 23166 28430 33032 35214 37576 39191
group-update 4002 23037 28008 32492 35161 36836 38850
master 3968 22883 27437 32217 34823 36668 38073

2) pgbench

3.2.80 16 32 64 128 192
-------------------------------------------------------------------
master 22904 36077 41295 35574 8297
granular-locking 23323 36254 42446 43909 8959
no-content-lock 23304 36670 42606 48440 8813
group-update 23127 36696 41859 46693 8345

4.5.5 1 8 16 32 64 128 192
-------------------------------------------------------------------
granular-locking 3116 19235 27388 29150 31905 34105 36359
no-content-lock 3206 19071 27492 29178 32009 34140 36321
group-update 3195 19104 26888 29236 32140 33953 35901
master 3136 18650 26249 28731 31515 33328 35243

The 4.5 kernel clearly changed the results significantly:

(a) Compared to the results from 3.2.80 kernel, some numbers improved,
some got worse. For example, on 3.2.80 pgbench did ~23k tps with 16
clients, on 4.5.5 it does 27k tps. With 64 clients the performance
dropped from 41k tps to ~34k (on master).

(b) The drop above 64 clients is gone - on 3.2.80 it dropped very
quickly to only ~8k with 192 clients. On 4.5 the tps actually continues
to increase, and we get ~35k with 192 clients.

(c) Although it's not visible in the results, 4.5.5 almost perfectly
eliminated the fluctuations in the results. For example when 3.2.80
produced this results (10 runs with the same parameters):

12118 11610 27939 11771 18065
12152 14375 10983 13614 11077

we get this on 4.5.5

37354 37650 37371 37190 37233
38498 37166 36862 37928 38509

Notice how much more even the 4.5.5 results are, compared to 3.2.80.

(d) There's no sign of any benefit from any of the patches (it was only
helpful >= 128 clients, but that's where the tps actually dropped on
3.2.80 - apparently 4.5.5 fixes that and the benefit is gone).

It's a bit annoying that after upgrading from 3.2.80 to 4.5.5, the
performance with 32 and 64 clients dropped quite noticeably (by more
than 10%). I believe that might be a kernel regression, but perhaps it's
a price for improved scalability for higher client counts.

It of course begs the question what kernel version is running on the
machine used by Dilip (i.e. cthulhu)? Although it's a Power machine, so
I'm not sure how much the kernel matters on it.

I'll ask someone else with access to this particular machine to repeat
the tests, as I have a nagging suspicion that I've missed something
important when compiling / running the benchmarks. I'll also retry the
benchmarks on 3.2.80 to see if I get the same numbers.

>
> Okay, but I think it is better to see the results between 64~128
> client count and may be greater than128 client counts, because it is
> clear that patch won't improve performance below that.
>

There are results for 64, 128 and 192 clients. Why should we care about
numbers in between? How likely (and useful) would it be to get
improvement with 96 clients, but no improvement for 64 or 128 clients?

>>
>> I agree with Robert that the cases the patch is supposed to
>> improve are a bit contrived because of the very high client
>> counts.
>>
>
> No issues, I have already explained why I think it is important to
> reduce the remaining CLOGControlLock contention in yesterday's and
> this mail. If none of you is convinced, then I think we have no
> choice but to drop this patch.
>

I agree it's useful to reduce lock contention in general, but
considering the last set of benchmarks shows no benefit with recent
kernel, I think we really need a better understanding of what's going
on, what workloads / systems it's supposed to improve, etc.

I don't dare to suggest rejecting the patch, but I don't see how we
could commit any of the patches at this point. So perhaps "returned with
feedback" and resubmitting in the next CF (along with analysis of
improved workloads) would be appropriate.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
results.ods application/vnd.oasis.opendocument.spreadsheet 58.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2016-09-23 00:57:48 Re: Why postgres take RowExclusiveLock on all partition
Previous Message Andrew Dunstan 2016-09-22 23:42:00 Re: pg_upgrade vs user created range type extension