Re: checkpointer continuous flushing

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2016-03-17 20:13:34
Message-ID: 34db4b9b-6bcb-8633-df87-064df76065e6@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/17/2016 06:36 PM, Fabien COELHO wrote:
>
> Hello Tomas,
>
> Thanks for these great measures.
>
>> * 4 x CPU E5-4620 (2.2GHz)
>
> 4*8 = 32 cores / 64 threads.

Yep. I only used 32 clients though, to keep some of the CPU available
for the rest of the system (also, HT does not really double the number
of cores).

>
>> * 256GB of RAM
>
> Wow!
>
>> * 24x SSD on LSI 2208 controller (with 1GB BBWC)
>
> Wow! RAID configuration ? The patch is designed to fix very big issues
> on HDD, but it is good to see that the impact is good on SSD as well.

Yep, RAID-10. I agree that doing the test on a HDD-based system would be
useful, however (a) I don't have a comparable system at hand at the
moment, and (b) I was a bit worried that it'll hurt performance on SSDs,
but thankfully that's not the case.

I will do the test on a much smaller system with HDDs in a few days.

>
> Is it possible to run tests with distinct table spaces on those many disks?

Nope, that'd require reconfiguring the system (and then back), and I
don't have access to that system (just SSH). Also, I don't quite see
what would that tell us?

>> * shared_buffers=64GB
>
> 1/4 of the available memory.
>
>> The pgbench was scale 60000, so ~750GB of data on disk,
>
> *3 available memory, mostly on disk.
>
>> or like this ("throttled"):
>>
>> pgbench -c 32 -j 8 -T 86400 -R 5000 -l --aggregate-interval=1 pgbench
>>
>> The reason for the throttling is that people generally don't run
>> production databases 100% saturated, so it'd be sad to improve the
>> 100% saturated case and hurt the common case by increasing latency.
>
> Sure.
>
>> The machine does ~8000 tps, so 5000 tps is ~60% of that.
>
> Ok.
>
> I would have suggested using the --latency-limit option to filter out
> very slow queries, otherwise if the system is stuck it may catch up
> later, but then this is not representative of "sustainable" performance.
>
> When pgbench is running under a target rate, in both runs the
> transaction distribution is expected to be the same, around 5000 tps,
> and the green run looks pretty ok with respect to that. The magenta one
> shows that about 25% of the time, things are not good at all, and the
> higher figures just show the catching up, which is not really
> interesting if you asked for a web page and it is finally delivered 1
> minutes later.

Maybe. But that'd only increase the stress on the system, possibly
causing more issues, no? And the magenta line is the old code, thus it
would only increase the improvement of the new code.

Notice the max latency is in microseconds (as logged by pgbench), so
according to the "max latency" charts the latencies are below 10 seconds
(old) and 1 second (new) about 99% of the time. So I don't think this
would make any measurable difference in practice.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-17 20:16:17 Re: WIP: Upper planner pathification
Previous Message Robert Haas 2016-03-17 20:13:26 Re: Using quicksort for every external sort run