Re: Speed up Clog Access by increasing CLOG buffers

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-09-23 14:52:40
Message-ID: 5da94f12-8141-2f2f-016a-09a8e37bdd30@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/23/2016 03:07 PM, Amit Kapila wrote:
> On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On 09/23/2016 01:44 AM, Tomas Vondra wrote:
>>>
>>> ...
>>> The 4.5 kernel clearly changed the results significantly:
>>>
>> ...
>>>
>>>
>>> (c) Although it's not visible in the results, 4.5.5 almost perfectly
>>> eliminated the fluctuations in the results. For example when 3.2.80
>>> produced this results (10 runs with the same parameters):
>>>
>>> 12118 11610 27939 11771 18065
>>> 12152 14375 10983 13614 11077
>>>
>>> we get this on 4.5.5
>>>
>>> 37354 37650 37371 37190 37233
>>> 38498 37166 36862 37928 38509
>>>
>>> Notice how much more even the 4.5.5 results are, compared to 3.2.80.
>>>
>>
>> The more I think about these random spikes in pgbench performance on 3.2.80,
>> the more I find them intriguing. Let me show you another example (from
>> Dilip's workload and group-update patch on 64 clients).
>>
>> This is on 3.2.80:
>>
>> 44175 34619 51944 38384 49066
>> 37004 47242 36296 46353 36180
>>
>> and on 4.5.5 it looks like this:
>>
>> 34400 35559 35436 34890 34626
>> 35233 35756 34876 35347 35486
>>
>> So the 4.5.5 results are much more even, but overall clearly below 3.2.80.
>> How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we
>> randomly do something right, but what is it and why doesn't it happen on the
>> new kernel? And how could we do it every time?
>>
>
> As far as I can see you are using default values of min_wal_size,
> max_wal_size, checkpoint related params, have you changed default
> shared_buffer settings, because that can have a bigger impact.

Huh? Where do you see me using default values? There are settings.log
with a dump of pg_settings data, and the modified values are

checkpoint_completion_target = 0.9
checkpoint_timeout = 3600
effective_io_concurrency = 32
log_autovacuum_min_duration = 100
log_checkpoints = on
log_line_prefix = %m
log_timezone = UTC
maintenance_work_mem = 524288
max_connections = 300
max_wal_size = 8192
min_wal_size = 1024
shared_buffers = 2097152
synchronous_commit = on
work_mem = 524288

(ignoring some irrelevant stuff like locales, timezone etc.).

> Using default values of mentioned parameters can lead to checkpoints in
> between your runs.

So I'm using 16GB shared buffers (so with scale 300 everything fits into
shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint
timeout 1h etc. So no, there are no checkpoints during the 5-minute
runs, only those triggered explicitly before each run.

> Also, I think instead of 5 mins, read-write runs should be run for 15
> mins to get consistent data.

Where does the inconsistency come from? Lack of warmup? Considering how
uniform the results from the 10 runs are (at least on 4.5.5), I claim
this is not an issue.

> For Dilip's workload where he is using only Select ... For Update, i
> think it is okay, but otherwise you need to drop and re-create the
> database between each run, otherwise data bloat could impact the
> readings.

And why should it affect 3.2.80 and 4.5.5 differently?

>
> I think in general, the impact should be same for both the kernels
> because you are using same parameters, but I think if use
> appropriate parameters, then you can get consistent results for
> 3.2.80. I have also seen variation in read-write tests, but the
> variation you are showing is really a matter of concern, because it
> will be difficult to rely on final data.
>

Both kernels use exactly the same parameters (fairly tuned, IMHO).

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-09-23 14:59:44 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message Tom Lane 2016-09-23 14:38:57 Re: [PATCH] Remove redundant if clause in standbydesc.c