Re: Redesigning checkpoint_segments

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Redesigning checkpoint_segments
Date: 2013-06-07 02:43:31
Message-ID: 51B148D3.9010208@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/6/13 4:41 AM, Heikki Linnakangas wrote:

> I was thinking of letting the estimate
> decrease like a moving average, but react to any increases immediately.
> Same thing we do in bgwriter to track buffer allocations:

Combine what your submitted patch does and this idea, and you'll have
something I prototyped a few years ago. I took the logic and tested it
out in user space by parsing the output from log_checkpoints to see how
many segments were being used. That approach coughed out a value about
as good for checkpoint_segments as I picked by hand.

The main problem was it liked to over-tune the segments based on a small
bursts of activity, leaving a value higher than you might want to use
the rest of the time. The background writer didn't worry about this
very much because the cost of making a mistake for one 200ms cycle was
pretty low. Setting checkpoint_segments high is a more expensive issue.
When I set these by hand, I'll aim more to cover a 99th percentile of
the maximum segments number rather than every worst case seen.

I don't think that improvement is worth spending very much effort on
though. The moving average approach is more than good enough in most
cases. I've wanted checkpoint_segments to go away in exactly this
fashion for a while.

The general complaint the last time I suggested a change in this area,
to make checkpoint_segments larger for the average user, was that some
people had seen workloads where that was counterproductive. Pretty sure
Kevin Grittner said he'd seen that happen. That's how I remember this
general idea dying the last time, and I still don't have enough data to
refute that doesn't happen.

As far as the UI, if it's a soft limit I'd suggest wal_size_target for
the name. What I would like to see is a single number here in memory
units that replaces both checkpoint_segments and wal_keep_segments. If
you're willing to use a large chunk of disk space to handle either one
of activity spikes or the class of replication issues wal_keep_segments
targets, I don't see why you'd want to ban using that space for the
other one too.

To put some perspective on how far we've been able to push this in the
field with minimal gripes, the repmgr tool requires wal_keep_segments be
>=5000, which works out to 78GB. I still see some people use 73GB SAS
drives in production servers for their WAL files, but that's the only
time I've seen that number become scary when deploying repmgr.
Meanwhile, the highest value for checkpoint_segments I've set based on
real activity levels was 1024, on a server where checkpoint_timeout is
15 minutes (and can be no shorter without checkpoint spikes). At no
point during that fairly difficult but of tuning work did
checkpoint_segments do anything but get in the way.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-06-07 04:14:42 Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Previous Message Greg Smith 2013-06-07 02:06:12 Re: Redesigning checkpoint_segments