Re: Speed up Clog Access by increasing CLOG buffers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-09-23 13:07:24
Message-ID: CAA4eK1K4HEsy819bkDxA3GxGBRsBvu9MmuGh3Q_CxUho29FG4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 09/23/2016 01:44 AM, Tomas Vondra wrote:
>>
>> ...
>> The 4.5 kernel clearly changed the results significantly:
>>
> ...
>>
>>
>> (c) Although it's not visible in the results, 4.5.5 almost perfectly
>> eliminated the fluctuations in the results. For example when 3.2.80
>> produced this results (10 runs with the same parameters):
>>
>> 12118 11610 27939 11771 18065
>> 12152 14375 10983 13614 11077
>>
>> we get this on 4.5.5
>>
>> 37354 37650 37371 37190 37233
>> 38498 37166 36862 37928 38509
>>
>> Notice how much more even the 4.5.5 results are, compared to 3.2.80.
>>
>
> The more I think about these random spikes in pgbench performance on 3.2.80,
> the more I find them intriguing. Let me show you another example (from
> Dilip's workload and group-update patch on 64 clients).
>
> This is on 3.2.80:
>
> 44175 34619 51944 38384 49066
> 37004 47242 36296 46353 36180
>
> and on 4.5.5 it looks like this:
>
> 34400 35559 35436 34890 34626
> 35233 35756 34876 35347 35486
>
> So the 4.5.5 results are much more even, but overall clearly below 3.2.80.
> How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we
> randomly do something right, but what is it and why doesn't it happen on the
> new kernel? And how could we do it every time?
>

As far as I can see you are using default values of min_wal_size,
max_wal_size, checkpoint related params, have you changed default
shared_buffer settings, because that can have a bigger impact. Using
default values of mentioned parameters can lead to checkpoints in
between your runs. Also, I think instead of 5 mins, read-write runs
should be run for 15 mins to get consistent data. For Dilip's
workload where he is using only Select ... For Update, i think it is
okay, but otherwise you need to drop and re-create the database
between each run, otherwise data bloat could impact the readings.

I think in general, the impact should be same for both the kernels
because you are using same parameters, but I think if use appropriate
parameters, then you can get consistent results for 3.2.80. I have
also seen variation in read-write tests, but the variation you are
showing is really a matter of concern, because it will be difficult to
rely on final data.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-09-23 13:10:13 Re: pg_ctl promote wait
Previous Message Pavan Deolasee 2016-09-23 12:59:21 Re: Speed up Clog Access by increasing CLOG buffers