Re: Maximum number of WAL files in the pg_xlog directory

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Maximum number of WAL files in the pg_xlog directory
Date: 2014-10-13 19:11:36
Message-ID: 20141013191136.GA21267@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 25, 2014 at 07:12:33AM +0200, Guillaume Lelarge wrote:
> Le 8 août 2014 09:08, "Guillaume Lelarge" <guillaume(at)lelarge(dot)info> a écrit :
> >
> > Hi,
> >
> > As part of our monitoring work for our customers, we stumbled upon an issue
> with our customers' servers who have a wal_keep_segments setting higher than 0.
> >
> > We have a monitoring script that checks the number of WAL files in the
> pg_xlog directory, according to the setting of three parameters
> (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). We
> usually add a percentage to the usual formula:
> >
> > greatest(
> >   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
> >   checkpoint_segments + wal_keep_segments + 1
> > )
> >
> > And we have lots of alerts from the script for customers who set their
> wal_keep_segments setting higher than 0.
> >
> > So we started to question this sentence of the documentation:
> >
> > There will always be at least one WAL segment file, and will normally not be
> more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 or
> checkpoint_segments + wal_keep_segments + 1 files.
> >
> > (http://www.postgresql.org/docs/9.3/static/wal-configuration.html)
> >
> > While doing some tests, it appears it would be more something like:
> >
> > wal_keep_segments + (2 + checkpoint_completion_target) * checkpoint_segments
> + 1
> >
> > But after reading the source code (src/backend/access/transam/xlog.c), the
> right formula seems to be:
> >
> > wal_keep_segments + 2 * checkpoint_segments + 1
> >
> > Here is how we went to this formula...
> >
> > CreateCheckPoint(..) is responsible, among other things, for deleting and
> recycling old WAL files. From src/backend/access/transam/xlog.c, master branch,
> line 8363:
> >
> > /*
> >  * Delete old log files (those no longer needed even for previous
> >  * checkpoint or the standbys in XLOG streaming).
> >  */
> > if (_logSegNo)
> > {
> >     KeepLogSeg(recptr, &_logSegNo);
> >     _logSegNo--;
> >     RemoveOldXlogFiles(_logSegNo, recptr);
> > }
> >
> > KeepLogSeg(...) function takes care of wal_keep_segments. From src/backend/
> access/transam/xlog.c, master branch, line 8792:
> >
> > /* compute limit for wal_keep_segments first */
> > if (wal_keep_segments > 0)
> > {
> >     /* avoid underflow, don't go below 1 */
> >     if (segno <= wal_keep_segments)
> >         segno = 1;
> >     else
> >         segno = segno - wal_keep_segments;
> > }
> >
> > IOW, the segment number (segno) is decremented according to the setting of
> wal_keep_segments. segno is then sent back to CreateCheckPoint(...) via
> _logSegNo. The RemoveOldXlogFiles() gets this segment number so that it can
> remove or recycle all files before this segment number. This function gets the
> number of WAL files to recycle with the XLOGfileslop constant, which is defined
> as:
> >
> > /*
> >  * XLOGfileslop is the maximum number of preallocated future XLOG segments.
> >  * When we are done with an old XLOG segment file, we will recycle it as a
> >  * future XLOG segment as long as there aren't already XLOGfileslop future
> >  * segments; else we'll delete it.  This could be made a separate GUC
> >  * variable, but at present I think it's sufficient to hardwire it as
> >  * 2*CheckPointSegments+1.  Under normal conditions, a checkpoint will free
> >  * no more than 2*CheckPointSegments log segments, and we want to recycle all
> >  * of them; the +1 allows boundary cases to happen without wasting a
> >  * delete/create-segment cycle.
> >  */
> > #define XLOGfileslop    (2*CheckPointSegments + 1)
> >
> > (in src/backend/access/transam/xlog.c, master branch, line 100)
> >
> > IOW, PostgreSQL will keep wal_keep_segments WAL files before the current WAL
> file, and then there may be 2*CheckPointSegments + 1 recycled ones. Hence the
> formula:
> >
> > wal_keep_segments + 2 * checkpoint_segments + 1
> >
> > And this is what we usually find in our customers' servers. We may find more
> WAL files, depending on the write activity of the cluster, but in average, we
> get this number of WAL files.
> >
> > AFAICT, the documentation is wrong about the usual number of WAL files in the
> pg_xlog directory. But I may be wrong, in which case, the documentation isn't
> clear enough for me, and should be fixed so that others can't misinterpret it
> like I may have done.
> >
> > Any comments? did I miss something, or should we fix the documentation?

I looked into this, and came up with more questions. Why is
checkpoint_completion_target involved in the total number of WAL
segments? If checkpoint_completion_target is 0.5 (the default), the
calculation is:

(2 + 0.5) * checkpoint_segments + 1

while if it is 0.9, it is:

(2 + 0.9) * checkpoint_segments + 1

Is this trying to estimate how many WAL files are going to be created
during the checkpoint? If so, wouldn't it be (1 +
checkpoint_completion_target), not "2 +". My logic is you have the old
WAL files being checkpointed (that's the "1"), plus you have new WAL
files being created during the checkpoint, which would be
checkpoint_completion_target * checkpoint_segments, plus one for the
current WAL file.

The original calculation is summarized in this email:

http://www.postgresql.org/message-id/AANLkTi=e=oR54OuxAw88=dtV4wt0e5edMiGaeZtBVcKO@mail.gmail.com

However, in my reading of this, it appears to be double-counting the WAL
files during the checkpoint, e.g. the checkpoint_completion_target *
checkpoint_segments WAL files are also part of the later
checkpoint_segments number.

I also don't see how that can be equivalent to:

checkpoint_segments + wal_keep_segments + 1

because wal_keep_segments isn't used in the first calculation. Is the
user supposed to compute the maximum of those two? Seems easier to just
give one expression.

Is the right answer:

max(checkpoint_segments, wal_keep_segments) + checkpoint_segments + 1

or, if you want to use checkpoint_completion_target, it would be:

max(checkpoint_segments * checkpoint_completion_target, wal_keep_segments) + checkpoint_segments + 1

Is checkpoint_completion_target accurate enough to define a maximum
number of files?

I think I need Masao Fujii's comments on this. The fact the user is
seeing something different from what is documented means something
probably needs updating.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Smith 2014-10-13 19:15:30 Re: pgbench throttling latency limit
Previous Message Alvaro Herrera 2014-10-13 18:28:43 Re: Missing IPv6 for pgbuildfarm.org