Re: Remove secondary checkpoint

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remove secondary checkpoint
Date: 2017-11-07 17:23:49
Message-ID: CANP8+jJGhb6uas1iSMcG_6Z73M27X1yEtWRO_iYeD1ZxDX_t0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 October 2017 at 12:01, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Oct 31, 2017 at 2:00 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 30 October 2017 at 18:58, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> On 30 October 2017 at 15:22, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>
>>>>> You forgot to update this formula in xlog.c:
>>>>> distance = (2.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate;
>>>>> /* add 10% for good measure. */
>>>>> distance *= 1.10;
>>>>> You can guess easily where the mistake is.
>>>>
>>>> Doh - fixed that before posting, so I must have sent the wrong patch.
>>>>
>>>> It's the 10%, right? ;-)
>>>
>>> So, there are 2 locations that mention 2.0 in xlog.c
>>>
>>> I had already fixed one, which is why I remembered editing it.
>>>
>>> The other one you mention has a detailed comment above it explaining
>>> why it should be 2.0 rather than 1.0, so I left it.
>>>
>>> Let me know if you still think it should be removed? I can see the
>>> argument both ways.
>>> Or maybe we need another patch to account for manual checkpoints.
>>
>> Revised patch to implement this
>
> Here is the comment and the formula:
> * The reason this calculation is done from the prior
> checkpoint, not the
> * one that just finished, is that this behaves better if some
> checkpoint
> * cycles are abnormally short, like if you perform a manual checkpoint
> * right after a timed one. The manual checkpoint will make
> almost a full
> * cycle's worth of WAL segments available for recycling, because the
> * segments from the prior's prior, fully-sized checkpoint cycle are no
> * longer needed. However, the next checkpoint will make only
> few segments
> * available for recycling, the ones generated between the timed
> * checkpoint and the manual one right after that. If at the manual
> * checkpoint we only retained enough segments to get us to
> the next timed
> * one, and removed the rest, then at the next checkpoint we would not
> * have enough segments around for recycling, to get us to the
> checkpoint
> * after that. Basing the calculations on the distance from
> the prior redo
> * pointer largely fixes that problem.
> */
> distance = (2.0 + CheckPointCompletionTarget) *
> CheckPointDistanceEstimate;
>
> While the mention about a manual checkpoint happening after a timed
> one will cause a full range of WAL segments to be recycled, it is not
> actually true that segments of the prior's prior checkpoint are not
> needed, because with your patch the segments of the prior checkpoint
> are getting recycled. So it seems to me that based on that the formula
> ought to use 1.0 instead of 2.0...

I think the argument in the comment is right, in that
CheckPointDistanceEstimate is better if we use multiple checkpoint
cycles.

But the implementation of that is bogus and multiplying by 2.0
wouldn't make it better if CheckPointDistanceEstimate is wrong.

So I will change to 1.0 as you say and rewrite the comment. Thanks for
your review.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2017-11-07 17:26:58 Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Previous Message Alexander Korotkov 2017-11-07 16:58:35 Re: Fix bloom WAL tap test