Re: Checkpointer on hot standby runs without looking checkpoint_segments

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, masao(dot)fujii(at)gmail(dot)com
Subject: Re: Checkpointer on hot standby runs without looking checkpoint_segments
Date: 2012-06-08 13:47:52
Message-ID: CA+TgmoYTEQDZLZ_6WFKPxnOh-4LU0-0WjW-LncgXB3SQCCQuMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 8, 2012 at 5:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 8 June 2012 09:14, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
>> The requirement for this patch is as follows.
>>
>> - What I want to get is similarity of the behaviors between
>>  master and (hot-)standby concerning checkpoint
>>  progression. Specifically, checkpoints for streaming
>>  replication running at the speed governed with
>>  checkpoint_segments. The work of this patch is avoiding to get
>>  unexpectedly large number of WAL segments stay on standby
>>  side. (Plus, increasing the chance to skip recovery-end
>>  checkpoint by my another patch.)
>
> Since we want wal_keep_segments number of WAL files on master (and
> because of cascading, on standby also), I don't see any purpose to
> triggering more frequent checkpoints just so we can hit a magic number
> that is most often set wrong.

This is a good point. Right now, if you set checkpoint_segments to a
large value, we retain lots of old WAL segments even when the system
is idle (cf. XLOGfileslop). I think we could be smarter about that.
I'm not sure what the exact algorithm should be, but right now users
are forced between setting checkpoint_segments very large to achieve
optimum write performance and setting it small to conserve disk space.
What would be much better, IMHO, is if the number of retained
segments could ratchet down when the system is idle, eventually
reaching a state where we keep only one segment beyond the one
currently in use.

For example, suppose I have checkpoint_timeout=10min and
checkpoint_segments=300. If, five minutes into the ten-minute
checkpoint interval, I've only used 10 WAL segments, then I probably
am not going to need another 290 of them in the remaining five
minutes. We ought to keep, say, 20 in that case (number we expect to
need * 2, similar to bgwriter_lru_multiplier) and delete the rest.

If we did that, people could set checkpoint_segments much higher to
handle periods of peak load without continuously consuming large
amounts of space with old, useless WAL segments. It doesn't end up
working very well anyway because the old WAL segments are no longer in
cache by the time we go to overwrite them.

> ISTM that we should avoid triggering a checkpoint on the master if
> checkpoint_segments is less than wal_keep_segments. Such checkpoints
> serve no purpose because we don't actually limit and recycle the WAL
> files and all it does is slow people down.

On the other hand, I emphatically disagree with this, for the same
reasons as on the other thread. Getting data down to disk provides a
greater measure of safety than having it in memory. Making
checkpoint_segments not force a checkpoint is no better than making
checkpoint_timeout not force a checkpoint.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-08 13:56:11 Re: log_newpage header comment
Previous Message Tom Lane 2012-06-08 13:33:27 Re: log_newpage header comment