Re: Why checkpoint_timeout had maximum value of 1h?

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Why checkpoint_timeout had maximum value of 1h?
Date: 2012-04-03 06:53:24
Message-ID: 4F7A9E64.4050005@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 03/29/2012 06:57 AM, Maxim Boguk wrote:
> Is there any real reason why checkpoint_timeout limited to 1hour?

Just to keep people from accidentally setting a value that's dangerously
high. There can be some pretty bad drops in performance if you let
writes pile up for too long, once the checkpoint really does start running.

> In my case I have some replicas with WAL on SAS raid and PGDATA on SSD
> with limited write endurance....In that case having
> checkpoint_timeout=10hour could reduce amout of writes on SSD by
> factor of 10, and increase planned ssd lifetime by the same amount.

The big write endurance problem is WAL data, and you're already
addressing that. Note that if nothing has been written out since the
last one, the checkpoint won't actually do anything. So this 10X
endurance idea might only work out on a system that's always doing
something. You'll certainly get less wear; without measuring your
workload better, I can't say just what the multiplier is.

The other idea you should be considering, if you haven't already, is not
provisioning all of the space.

> I would like to have ability to set checkpoint_timeout=high value
> and (whats even better) checkpoint_timeout=0 - in that case checkpoint
> happen when all checkpoint_segments were used.
> Is there any serious drawbacks in that idea?
> Is it safe to increase that limit in source and rebuild database?
> (9.0 and 9.1 case)

You can edit src/backend/utils/misc/guc.c , find checkpoint_time, and
change the 3600 value there to something higher. You will need to
rebuild the whole database cluster with that setting (initdb), and
moving a database cluster of files between your tweaked version to/from
a regular PostgreSQL will do strange things. You can prevent that from
happening accidentally by editing src/include/catalog/catversion.h ;
find the line that looks like this:

#define CATALOG_VERSION_NO 201202141

And change it. It's just YYYYMMDDN to create a complete catalog serial
number, where N is an incrementing number if more than one change is
made on the same day. If you do that and increase the upper bound on
checkpoint_timeout, that should do what you want, while protecting
against the dangerous situation--where system catalog doesn't match the
database binaries.

Setting checkpoint_timeout to 0 instead won't work--it will checkpoint
all of the time then. The bottom limit is 30 seconds and you don't want
to touch that. It's possible to make 0 mean "never timeout", but that
would require juggling a couple of code pieces around. The idea of just
making the timeout big is a safer thing to consider. I'm not sure if
you'll really see the gains you're hoping for, but it should be easy
enough to test.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2012-04-03 07:01:33 Re: oracle linux
Previous Message Alban Hertroys 2012-04-03 06:47:51 Re: Please help me to take a look of the erros in my functions. Thanks.