Re: Speed up Clog Access by increasing CLOG buffers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-09-23 03:10:37
Message-ID: CAA4eK1Kshqxa1birZxocNEWJROaiasUNycL+43b8JTTq+O2Vog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 09/21/2016 08:04 AM, Amit Kapila wrote:
>>
>
> (c) Although it's not visible in the results, 4.5.5 almost perfectly
> eliminated the fluctuations in the results. For example when 3.2.80 produced
> this results (10 runs with the same parameters):
>
> 12118 11610 27939 11771 18065
> 12152 14375 10983 13614 11077
>
> we get this on 4.5.5
>
> 37354 37650 37371 37190 37233
> 38498 37166 36862 37928 38509
>
> Notice how much more even the 4.5.5 results are, compared to 3.2.80.
>

how long each run was? Generally, I do half-hour run to get stable results.

> (d) There's no sign of any benefit from any of the patches (it was only
> helpful >= 128 clients, but that's where the tps actually dropped on 3.2.80
> - apparently 4.5.5 fixes that and the benefit is gone).
>
> It's a bit annoying that after upgrading from 3.2.80 to 4.5.5, the
> performance with 32 and 64 clients dropped quite noticeably (by more than
> 10%). I believe that might be a kernel regression, but perhaps it's a price
> for improved scalability for higher client counts.
>
> It of course begs the question what kernel version is running on the machine
> used by Dilip (i.e. cthulhu)? Although it's a Power machine, so I'm not sure
> how much the kernel matters on it.
>

cthulhu is a x86 m/c and the kernel version is 3.10. Seeing, the
above results I think kernel version do matter, but does that mean we
ignore the benefits we are seeing on somewhat older kernel version. I
think right answer here is to do some experiments which can show the
actual contention as suggested by Robert and you.

> I'll ask someone else with access to this particular machine to repeat the
> tests, as I have a nagging suspicion that I've missed something important
> when compiling / running the benchmarks. I'll also retry the benchmarks on
> 3.2.80 to see if I get the same numbers.
>
>>
>> Okay, but I think it is better to see the results between 64~128
>> client count and may be greater than128 client counts, because it is
>> clear that patch won't improve performance below that.
>>
>
> There are results for 64, 128 and 192 clients. Why should we care about
> numbers in between? How likely (and useful) would it be to get improvement
> with 96 clients, but no improvement for 64 or 128 clients?
>

The only point to take was to see from where we have started seeing
improvement, saying that the TPS has improved from >=72 client count
is different from saying that it has improved from >=128.

>> No issues, I have already explained why I think it is important to
>> reduce the remaining CLOGControlLock contention in yesterday's and
>> this mail. If none of you is convinced, then I think we have no
>> choice but to drop this patch.
>>
>
> I agree it's useful to reduce lock contention in general, but considering
> the last set of benchmarks shows no benefit with recent kernel, I think we
> really need a better understanding of what's going on, what workloads /
> systems it's supposed to improve, etc.
>
> I don't dare to suggest rejecting the patch, but I don't see how we could
> commit any of the patches at this point. So perhaps "returned with feedback"
> and resubmitting in the next CF (along with analysis of improved workloads)
> would be appropriate.
>

Agreed with your conclusion and changed the status of patch in CF accordingly.

Many thanks for doing the tests.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-09-23 03:34:13 Re: Parallel sec scan in plpgsql
Previous Message Amit Kapila 2016-09-23 02:59:49 Re: Speed up Clog Access by increasing CLOG buffers