Re: WAL partition filling up after high WAL activity

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: WAL partition filling up after high WAL activity
Date: 2011-11-12 06:31:16
Message-ID: 4EBE12B4.5020705@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 11/11/2011 04:54 AM, Rafael Martinez wrote:
> Your explanation in 2) sounds like a good candidate for the problem we
> had. As I said in june, I think we need to improve the documentation in
> this area. A note in the documentation about what you have explained in
> 2) with maybe some hints about how to find out if this is happening will
> be a great improvement.
>

A new counter was added to pg_stat_bgwriter in PostgreSQL 9.1 that
tracks when the problem I described happens. It's hard to identify it
specifically without a source code change of some sort. Initially I
added new logging to the server code to identify the issue before the
new counter was there. The only thing you can easily look at that tends
to correlate well with the worst problems here is the output from
turning log_checkpoint on. Specifically, the "sync" times going way up
is a sign there's a problem with write speed.

As for the documentation, not much has really changed from when you
brought this up on the docs list. The amount of WAL files that can be
created by a "short-term peak" is unlimited, which is why there's no
better limit listed than that. Some of the underlying things that make
the problem worse are operating system level issues, not ones in the
database itself; the PostgreSQL documentation doesn't try to wander too
far into that level. There are also a large number of things you can do
at the application level that will generate a lot of WAL activity. It
would be impractical to list all of them in the checkpoint documentation
though.

On reviewing this section of the docs again, one thing that we could do
is make the "WAL Configuration" section talk more about log_checkpoints
and interpreting its output. Right now there's no mention of that
parameter in the section that talks about parameters to configure; there
really should be.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gavin Flower 2011-11-12 09:28:45 Re: Subquery in a JOIN not getting restricted?
Previous Message Shaun Thomas 2011-11-11 22:21:18 Using incrond for archiving