Quick Links

Re: Checkpointer on hot standby runs without looking checkpoint_segments

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com, masao(dot)fujii(at)gmail(dot)com
Subject:	Re: Checkpointer on hot standby runs without looking checkpoint_segments
Date:	2012-06-08 17:15:07
Message-ID:	CA+Tgmoa8kTT0JLs1FQ7C43VbkboEFJnOsJRTBbgdm5XRLiFZkA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jun 8, 2012 at 1:01 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Jun8, 2012, at 15:47 , Robert Haas wrote:
>> On Fri, Jun 8, 2012 at 5:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> On 8 June 2012 09:14, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>
>>>> The requirement for this patch is as follows.
>>>>
>>>> - What I want to get is similarity of the behaviors between
>>>> master and (hot-)standby concerning checkpoint
>>>> progression. Specifically, checkpoints for streaming
>>>> replication running at the speed governed with
>>>> checkpoint_segments. The work of this patch is avoiding to get
>>>> unexpectedly large number of WAL segments stay on standby
>>>> side. (Plus, increasing the chance to skip recovery-end
>>>> checkpoint by my another patch.)
>>>
>>> Since we want wal_keep_segments number of WAL files on master (and
>>> because of cascading, on standby also), I don't see any purpose to
>>> triggering more frequent checkpoints just so we can hit a magic number
>>> that is most often set wrong.
>>
>> This is a good point. Right now, if you set checkpoint_segments to a
>> large value, we retain lots of old WAL segments even when the system
>> is idle (cf. XLOGfileslop). I think we could be smarter about that.
>> I'm not sure what the exact algorithm should be, but right now users
>> are forced between setting checkpoint_segments very large to achieve
>> optimum write performance and setting it small to conserve disk space.
>> What would be much better, IMHO, is if the number of retained
>> segments could ratchet down when the system is idle, eventually
>> reaching a state where we keep only one segment beyond the one
>> currently in use.
>
> I'm a bit sceptical about this. It seems to me that you wouldn't actually
> be able to do anything useful with the conserved space, since postgres
> could re-claim it at any time. At which point it'd better be available,
> or your whole cluster comes to a screeching halt...

Well, the issue for me is elasticity. Right now we ship with
checkpoint_segments=3. That causes terribly performance on many
real-world workloads. But say we ship with checkpoint_segments = 100,
which is a far better setting from a performance point of view. Then
pg_xlog space utilization will eventually grow to more than 3 GB, even
on a low-velocity system where they don't improve performance. I'm
not sure whether it's useful for the number of checkpoint segments to
vary dramatically on a single system, but I do think it would be very
nice if we could ship with a less conservative default without eating
up so much disk space. Maybe there's a better way of going about
that, but I agree with Simon's point that the setting is often wrong.
Frequently it's too low; sometimes it's too high; occasionally it's
got both problems simultaneously. If you have another idea on how to
improve this, I'm all ears.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Checkpointer on hot standby runs without looking checkpoint_segments at 2012-06-08 17:01:12 from Florian Pflug

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2012-06-08 17:20:19	Re: log_newpage header comment
Previous Message	Amit kapila	2012-06-08 17:14:36	WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink