Re: Redesigning checkpoint_segments

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redesigning checkpoint_segments
Date: 2014-12-31 10:54:38
Message-ID: 54A3D5EE.2010808@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(reviving an old thread)

On 08/24/2013 12:53 AM, Josh Berkus wrote:
> On 08/23/2013 02:08 PM, Heikki Linnakangas wrote:
>
>> Here's a bigger patch, which does more. It is based on the ideas in the
>> post I started this thread with, with feedback incorporated from the
>> long discussion. With this patch, WAL disk space usage is controlled by
>> two GUCs:
>>
>> min_recycle_wal_size
>> checkpoint_wal_size
>>
> <snip>
>
>> These settings are fairly intuitive for a DBA to tune. You begin by
>> figuring out how much disk space you can afford to spend on WAL, and set
>> checkpoint_wal_size to that (with some safety margin, of course). Then
>> you set checkpoint_timeout based on how long you're willing to wait for
>> recovery to finish. Finally, if you have infrequent batch jobs that need
>> a lot more WAL than the system otherwise needs, you can set
>> min_recycle_wal_size to keep enough WAL preallocated for the spikes.
>
> We'll want to rename them to make it even *more* intuitive.

Sure, I'm all ears.

> But ... do I understand things correctly that checkpoint wouldn't "kick
> in" until you hit checkpoint_wal_size? If that's the case, isn't real
> disk space usage around 2X checkpoint_wal_size if spread checkpoint is
> set to 0.9? Or does checkpoint kick in sometime earlier?

It kicks in earlier, so that the checkpoint *completes* just when
checkpoint_wal_size of WAL is used up. So the real disk usage is
checkpoint_wal_size.

There is still an internal variable called CheckPointSegments that
triggers the checkpoint, but it is now derived from checkpoint_wal_size
(see CalculateCheckpointSegments function):

CheckPointSegments = (checkpoint_wal_size / 16 MB) / (2 +
checkpoint_completion_target)

This is the same formula we've always had in the manual for calculating
the amount of WAL space used, but in reverse. I.e. we calculate
CheckPointSegments based on the desired disk space usage, not the other
way round.

> As a note, pgBench would be a terrible test for this patch; we really
> need something which creates uneven traffic. I'll see if I can devise
> something.

Attached is a rebased version of this patch. Everyone, please try this
out on whatever workloads you have, and let me know:

a) How does the auto-tuning of the number of recycled segments work?
Does pg_xlog reach a steady-state size, or does it fluctuate a lot?

b) Are the two GUCs, checkpoint_wal_size, and min_recycle_wal_size,
intuitive to set?

- Heikki

Attachment Content-Type Size
redesign-checkpoint-segments-2.patch text/x-diff 34.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-12-31 11:18:18 Re: Redesigning checkpoint_segments
Previous Message Michael Paquier 2014-12-31 10:40:17 Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)