Re: checkpoint patches

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject: Re: checkpoint patches
Date: 2012-03-22 19:36:50
Message-ID: CA+TgmoYzKnqF66tFwRwgXVN-UUQwu5O6X6rMywX7Ocx1vRRRnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 22, 2012 at 9:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> However, looking at this a bit more, I think the
> checkpoint-sync-pause-v1 patch contains an obvious bug - the GUC is
> supposedly represented in seconds (though not marked with GUC_UNIT_S,
> oops) but what the sleep implements is actually *tenths of a second*.
> So I think I'd better rerun these tests with checkpoint_sync_pause=30
> so that I get a three-second delay rather than a
> three-tenths-of-a-second delay between each fsync.

OK, I did that, rerunning the test with just checkpoint-sync-pause-v1
and master, still with scale factor 1000 and 32 clients. Tests were
run on the two branches in alternation, so checkpoint-sync-pause-v1,
then master, then checkpoint-sync-pause-v1, then master, etc.; with a
new initdb and data load each time. TPS numbers:

checkpoint-sync-pause-v1: 2594.448538, 2600.231666, 2580.041376
master: 2466.399991, 2450.752843, 2291.613305

90th percentile latency:

checkpoint-sync-pause-v1: 1487, 1488, 1481
master: 1493, 1519, 1507

That's about a 6% increase in throughput and about a 1.3% reduction in
90th-percentile latency. On the other hand, the two timed checkpoints
on the master branch, on each test run, are exactly 15 minutes apart,
whereas with the patch, they're 15 minutes and 30-40 seconds apart,
which may account for some of the difference. I'm going to do a bit
more testing to try to isolate that.

I'm attaching a possibly-interesting graph comparing the first
checkpoint-sync-pause-v1 run against the second master run; I chose
that particular combination because those are the runs with the median
tps results. It's interesting how eerily similar the two runs are to
each other; they have spikes and dips in almost the same places, and
what looks like random variation is apparently not so random after
all. The graph attached here is based on tps averaged over ten second
intervals.

Thoughts? Comments? Ideas?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
image/png 9.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-03-22 19:45:29 Re: checkpoint patches
Previous Message Stephen Frost 2012-03-22 19:13:52 COPY / extend ExclusiveLock