Re: Load Distributed Checkpoints test results

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>
Subject: Re: Load Distributed Checkpoints test results
Date: 2007-06-20 17:58:14
Message-ID: 46796AB6.8060009@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've uploaded the latest test results to the results page at
http://community.enterprisedb.com/ldc/

The test results on the index page are not in a completely logical
order, sorry about that.

I ran a series of tests with 115 warehouses, and no surprises there. LDC
smooths the checkpoints nicely.

Another series with 150 warehouses is more interesting. At that # of
warehouses, the data disks are 100% busy according to iostat. The 90%
percentile response times are somewhat higher with LDC, though the
variability in both the baseline and LDC test runs seem to be pretty
high. Looking at the response time graphs, even with LDC there's clear
checkpoint spikes there, but they're much less severe than without.

Another series was with 90 warehouses, but without think times, driving
the system to full load. LDC seems to smooth the checkpoints very nicely
in these tests.

Heikki Linnakangas wrote:
> Gregory Stark wrote:
>> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>>> Now that the checkpoints are spread out more, the response times are
>>> very
>>> smooth.
>>
>> So obviously the reason the results are so dramatic is that the
>> checkpoints
>> used to push the i/o bandwidth demand up over 100%. By spreading it
>> out you
>> can see in the io charts that even during the checkpoint the i/o busy
>> rate
>> stays just under 100% except for a few data points.
>>
>> If I understand it right Greg Smith's concern is that in a busier
>> system where
>> even *with* the load distributed checkpoint the i/o bandwidth demand
>> during t
>> he checkpoint was *still* being pushed over 100% then spreading out
>> the load
>> would only exacerbate the problem by extending the outage.
>>
>> To that end it seems like what would be useful is a pair of tests with
>> and
>> without the patch with about 10% larger warehouse size (~ 115) which
>> would
>> push the i/o bandwidth demand up to about that level.
>
> I still don't see how spreading the writes could make things worse, but
> running more tests is easy. I'll schedule tests with more warehouses
> over the weekend.
>
>> It might even make sense to run a test with an outright overloaded to
>> see if
>> the patch doesn't exacerbate the condition. Something with a warehouse
>> size of
>> maybe 150. I would expect it to fail the TPCC constraints either way
>> but what
>> would be interesting to know is whether it fails by a larger margin
>> with the
>> LDC behaviour or a smaller margin.
>
> I'll do that as well, though experiences with tests like that in the
> past have been that it's hard to get repeatable results that way.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2007-06-20 20:07:02 Re: Load Distributed Checkpoints test results
Previous Message Marko Kreen 2007-06-20 17:02:44 Re: PG-MQ?