Re: checkpoint patches

From: Jim Nasby <jim(at)nasby(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject: Re: checkpoint patches
Date: 2012-03-25 20:29:01
Message-ID: 4F6F800D.8000808@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/23/12 7:38 AM, Robert Haas wrote:
> And here are the latency results for 95th-100th percentile with
> checkpoint_timeout=16min.
>
> ckpt.master.13: 1703, 1830, 2166, 17953, 192434, 43946669
> ckpt.master.14: 1728, 1858, 2169, 15596, 187943, 9619191
> ckpt.master.15: 1700, 1835, 2189, 22181, 206445, 8212125
>
> The picture looks similar here. Increasing checkpoint_timeout isn't
> *quite* as good as spreading out the fsyncs, but it's pretty darn
> close. For example, looking at the median of the three 98th
> percentile numbers for each configuration, the patch bought us a 28%
> improvement in 98th percentile latency. But increasing
> checkpoint_timeout by a minute bought us a 15% improvement in 98th
> percentile latency. So it's still not clear to me that the patch is
> doing anything on this test that you couldn't get just by increasing
> checkpoint_timeout by a few more minutes. Granted, it lets you keep
> your inter-checkpoint interval slightly smaller, but that's not that
> exciting. That having been said, I don't have a whole lot of trouble
> believing that there are other cases where this is more worthwhile.

I wouldn't be too quick to dismiss increasing checkpoint frequency (ie: decreasing checkpoint_timeout).

On a high-value production system you're going to care quite a bit about recovery time. I certainly wouldn't want to run our systems with checkpoint_timeout='15 min' if I could avoid it.

Another $0.02: I don't recall the community using pg_bench much at all to measure latency... I believe it's something fairly new. I point this out because I believe there are differences in analysis that you need to do for TPS vs latency. I think Robert's graphs support my argument; the numeric X-percentile data might not look terribly good, but reducing peak latency from 100ms to 60ms could be a really big deal on a lot of systems. My intuition is that one or both of these patches actually would be valuable in the real world; it would be a shame to throw them out because we're not sure how to performance test them...
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2012-03-25 20:43:51 Re: COPY / extend ExclusiveLock
Previous Message Josh Berkus 2012-03-25 19:59:30 Re: who's familiar with the GSOC application process