Re: Checkpointer on hot standby runs without looking checkpoint_segments

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, masao(dot)fujii(at)gmail(dot)com
Subject: Re: Checkpointer on hot standby runs without looking checkpoint_segments
Date: 2012-06-08 17:15:07
Message-ID: CA+Tgmoa8kTT0JLs1FQ7C43VbkboEFJnOsJRTBbgdm5XRLiFZkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 8, 2012 at 1:01 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Jun8, 2012, at 15:47 , Robert Haas wrote:
>> On Fri, Jun 8, 2012 at 5:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> On 8 June 2012 09:14, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>
>>>> The requirement for this patch is as follows.
>>>>
>>>> - What I want to get is similarity of the behaviors between
>>>>  master and (hot-)standby concerning checkpoint
>>>>  progression. Specifically, checkpoints for streaming
>>>>  replication running at the speed governed with
>>>>  checkpoint_segments. The work of this patch is avoiding to get
>>>>  unexpectedly large number of WAL segments stay on standby
>>>>  side. (Plus, increasing the chance to skip recovery-end
>>>>  checkpoint by my another patch.)
>>>
>>> Since we want wal_keep_segments number of WAL files on master (and
>>> because of cascading, on standby also), I don't see any purpose to
>>> triggering more frequent checkpoints just so we can hit a magic number
>>> that is most often set wrong.
>>
>> This is a good point.  Right now, if you set checkpoint_segments to a
>> large value, we retain lots of old WAL segments even when the system
>> is idle (cf. XLOGfileslop).  I think we could be smarter about that.
>> I'm not sure what the exact algorithm should be, but right now users
>> are forced between setting checkpoint_segments very large to achieve
>> optimum write performance and setting it small to conserve disk space.
>> What would be much better, IMHO, is if the number of retained
>> segments could ratchet down when the system is idle, eventually
>> reaching a state where we keep only one segment beyond the one
>> currently in use.
>
> I'm a bit sceptical about this. It seems to me that you wouldn't actually
> be able to do anything useful with the conserved space, since postgres
> could re-claim it at any time. At which point it'd better be available,
> or your whole cluster comes to a screeching halt...

Well, the issue for me is elasticity. Right now we ship with
checkpoint_segments=3. That causes terribly performance on many
real-world workloads. But say we ship with checkpoint_segments = 100,
which is a far better setting from a performance point of view. Then
pg_xlog space utilization will eventually grow to more than 3 GB, even
on a low-velocity system where they don't improve performance. I'm
not sure whether it's useful for the number of checkpoint segments to
vary dramatically on a single system, but I do think it would be very
nice if we could ship with a less conservative default without eating
up so much disk space. Maybe there's a better way of going about
that, but I agree with Simon's point that the setting is often wrong.
Frequently it's too low; sometimes it's too high; occasionally it's
got both problems simultaneously. If you have another idea on how to
improve this, I'm all ears.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-08 17:20:19 Re: log_newpage header comment
Previous Message Amit kapila 2012-06-08 17:14:36 WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink