Re: Speed up Clog Access by increasing CLOG buffers

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-10-31 21:48:01
Message-ID: 5275bf49-545e-e189-48c5-17b5defc45a2@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/31/2016 02:24 PM, Tomas Vondra wrote:
> On 10/31/2016 05:01 AM, Jim Nasby wrote:
>> On 10/30/16 1:32 PM, Tomas Vondra wrote:
>>>
>>> Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's
>>> some sort of CPU / OS scheduling artifact. For example, the system has
>>> 36 physical cores, 72 virtual ones (thanks to HT). I find it strange
>>> that the "good" client counts are always multiples of 72, while the
>>> "bad" ones fall in between.
>>>
>>> 72 = 72 * 1 (good)
>>> 108 = 72 * 1.5 (bad)
>>> 144 = 72 * 2 (good)
>>> 180 = 72 * 2.5 (bad)
>>> 216 = 72 * 3 (good)
>>> 252 = 72 * 3.5 (bad)
>>> 288 = 72 * 4 (good)
>>>
>>> So maybe this has something to do with how OS schedules the tasks, or
>>> maybe some internal heuristics in the CPU, or something like that.
>>
>> It might be enlightening to run a series of tests that are 72*.1 or *.2
>> apart (say, 72, 79, 86, ..., 137, 144).
>
> Yeah, I've started a benchmark with client a step of 6 clients
>
> 36 42 48 54 60 66 72 78 ... 252 258 264 270 276 282 288
>
> instead of just
>
> 36 72 108 144 180 216 252 288
>
> which did a test every 36 clients. To compensate for the 6x longer runs,
> I'm only running tests for "group-update" and "master", so I should have
> the results in ~36h.
>

So I've been curious and looked at results of the runs executed so far,
and for the group_update patch it looks like this:

clients tps
-----------------
36 117663
42 139791
48 129331
54 144970
60 124174
66 137227
72 146064
78 100267
84 141538
90 96607
96 139290
102 93976
108 136421
114 91848
120 133563
126 89801
132 132607
138 87912
144 129688
150 87221
156 129608
162 85403
168 130193
174 83863
180 129337
186 81968
192 128571
198 82053
204 128020
210 80768
216 124153
222 80493
228 125503
234 78950
240 125670
246 78418
252 123532
258 77623
264 124366
270 76726
276 119054
282 76960
288 121819

So, similar saw-like behavior, perfectly periodic. But the really
strange thing is the peaks/valleys don't match those observed before!

That is, during the previous runs, 72, 144, 216 and 288 were "good"
while 108, 180 and 252 were "bad". But in those runs, all those client
counts are "good" ...

Honestly, I have no idea what to think about this ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-10-31 21:51:21 Re: WAL consistency check facility
Previous Message Tomas Vondra 2016-10-31 21:36:28 Re: Speed up Clog Access by increasing CLOG buffers