Re: Maximum number of WAL files in the pg_xlog directory

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Maximum number of WAL files in the pg_xlog directory
Date: 2014-10-14 16:20:22
Message-ID: CAMkU=1yfiJh-UYcujka9dKZKEm+6U=DyvJaM8+ABeBQ+bAdWLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 13, 2014 at 12:11 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

>
> I looked into this, and came up with more questions. Why is
> checkpoint_completion_target involved in the total number of WAL
> segments? If checkpoint_completion_target is 0.5 (the default), the
> calculation is:
>
> (2 + 0.5) * checkpoint_segments + 1
>
> while if it is 0.9, it is:
>
> (2 + 0.9) * checkpoint_segments + 1
>
> Is this trying to estimate how many WAL files are going to be created
> during the checkpoint? If so, wouldn't it be (1 +
> checkpoint_completion_target), not "2 +". My logic is you have the old
> WAL files being checkpointed (that's the "1"), plus you have new WAL
> files being created during the checkpoint, which would be
> checkpoint_completion_target * checkpoint_segments, plus one for the
> current WAL file.
>

WAL is not eligible to be recycled until there have been 2 successful
checkpoints.

So at the end of a checkpoint, you have 1 cycle of WAL which has just
become eligible for recycling,
1 cycle of WAL which is now expendable but which is kept anyway, and
checkpoint_completion_target worth of WAL which has occurred while the
checkpoint was occurring and is still needed for crash recovery.

I don't really understand the point of this way of doing things. I guess
it is because the control file contains two redo pointers, one for the last
checkpoint, and one for the previous to that checkpoint, and if recovery
finds that it can't use the most recent one it tries the ones before that.
Why? Beats me. If we are worried about the control file getting a corrupt
redo pointer, it seems like we would record the last one twice, rather than
recording two different ones once each. And if the in-memory version got
corrupted before being written to the file, I really doubt anything is
going to save your bacon at that point.

I've never seen a case where recovery couldn't use the last recorded good
checkpoint, so instead used the previous one, and was successful at it.
But then again I haven't seen all possible crashes.

This is based on memory from the last time I looked into this, I haven't
re-verified it so could be wrong or obsolete.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-10-14 16:25:31 Re: Buffer Requests Trace
Previous Message Lucas Lersch 2014-10-14 16:08:31 Buffer Requests Trace