Re: Spread checkpoint sync

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spread checkpoint sync
Date: 2011-01-15 22:57:02
Message-ID: 4D32263E.8000100@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> That seems like a bad idea - don't we routinely recommend that people
> crank this up to 0.9? You'd be effectively bounding the upper range
> of this setting to a value to the less than the lowest value we
> recommend anyone use today.
>

I was just giving an example of how I might do an initial split.
There's a checkpoint happening now at time T; we have a rough idea that
it needs to be finished before some upcoming time T+D. Currently with
default parameters this becomes:

Write: 0.5 * D; Sync: 0

Even though Sync obviously doesn't take zero. The slop here is enough
that it usually works anyway.

I was suggesting that a quick reshuffling to:

Write: 0.4 * D; Sync: 0.4 * D

Might be a good first candidate for how to split the time up better.
The fact that this gives less writing time than the current biggest
spread possible:

Write: 0.9 * D; Sync: 0

Is true. It's also true that in the case where sync time really is
zero, this new default would spread writes less than the current
default. I think that's optimistic, but it could happen if checkpoints
are small and you have a good write cache.

Step back from that a second though. Ultimately, the person who is
getting checkpoints at a 5 minute interval, and is being nailed by
spikes, should have the option of just increasing the parameters to make
that interval bigger. First you increase the measly default segments to
a reasonable range, then checkpoint_completion_target is the second one
you can try. But from there, you quickly move onto making
checkpoint_timeout longer. At some point, there is no option but to
give up checkpoints every 5 minutes as being practical, and make the
average interval longer.

Whether or not a refactoring here makes things slightly worse for cases
closer to the default doesn't bother me too much. What bothers me is
the way trying to stretch checkpoints out further fails to deliver as
well as it should. I'd be OK with saying "to get the exact same spread
situation as in older versions, you may need to retarget for checkpoints
every 6 minutes" *if* in the process I get a much better sync latency
distribution in most cases.

Here's an interesting data point from the customer site this all started
at, one I don't think they'll mind sharing since it helps make the
situation more clear to the community. After applying this code to
spread sync out, in order to get their server back to functional we had
to move all the parameters involved up to where checkpoints were spaced
35 minutes apart. It just wasn't possible to write any faster than that
without disrupting foreground activity.

The whole current model where people think of this stuff in terms of
segments and completion targets is a UI disaster. The direction I want
to go in is where users can say "make sure checkpoints happen every N
minutes", and something reasonable happens without additional parameter
fiddling. And if the resulting checkpoint I/O spike is too big, they
just increase the timeout to N+1 or N*2 to spread the checkpoint
further. Getting too bogged down thinking in terms of the current,
really terrible interface is something I'm trying to break myself of.
Long-term, I want there to be checkpoint_timeout, and all the other
parameters are gone, replaced by an internal implementation of the best
practices proven to work even on busy systems. I don't have as much
clarity on exactly what that best practice is the way that, say, I just
suggested exactly how to eliminate wal_buffers as an important thing to
manually set. But that's the dream UI: you shoot for a checkpoint
interval, and something reasonable happens; if that's too intense, you
just increase the interval to spread further. There probably will be
small performance regression possible vs. the current code with
parameter combination that happen to work well on it. Preserving every
one of those is something that's not as important to me as making the
tuning interface simple and clear.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2011-01-15 23:20:57 Re: LAST CALL FOR 9.1
Previous Message Alex Hunsaker 2011-01-15 22:48:28 Re: arrays as pl/perl input arguments [PATCH]