Re: WAL file utilization question

From: Keaton Adams <kadams(at)mxlogic(dot)com>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: WAL file utilization question
Date: 2007-05-16 16:49:04
Message-ID: 1179334144.22514.14.camel@MXLRMT-208.corp.mxlogic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Well, no, I'm not that paranoid where I expect checkpoints to be taking
place so often. I do find it interesting that controlling checkpoint
frequency is a factor of the number of WAL files available. In order to
get up to 60 second checkpoints I had to set checkpoint_segments to 25,
which resulted in the creation of 52 log files in data/pg_xlog. So for
120 second checkpoints I will need to have up to 104 log files in
data/pg_xlog available? That's nearly 2 GB of space for WAL logs, and
it only nets on average a 2 minute checkpoint. That would appear to be
the pattern (2 * checkpoint_segments at the busiest time):

# - Checkpoints -

#3 - 8 seconds - 3 log files
#6 - 16 seconds - 13 log files
#9 - 29 seconds - 19 log files
#20 - 57 seconds - 41 log files
#25 - > 60 sec - 52 log files

checkpoint_segments = 25 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300 # range 30-3600, in seconds
#checkpoint_warning = 60 # in seconds, 0 is off

So, if I do set the checkpoint_segments parameter to 50, 75, or even 100
I will have a considerable number of WAL files in data/pg_xlog, but they
will be used more efficiently, so I will be archiving less? That's what
I understand from the comments received so far.

On PITR it makes sense, given the nature of the WAL file, that only full
pages can be used. So I won't be able to set full_page_writes to false
in this case.

-Keaton

On Wed, 2007-05-16 at 11:35 -0500, Jim C. Nasby wrote:

> On Wed, May 16, 2007 at 10:28:29AM -0600, Keaton Adams wrote:
> > On most database systems I am used to, there is a physical log and
> > separate logical log. The physical log holds any ???before images??? of data
> > pages that have been modified since the last checkpoint. The logical log
> > holds the individual rows of data that have been inserted, updated,
> > deleted as well as checkpoint records and the like. If the server
> > crashes, fast recovery is initiated where (1) any pages in the physical
> > log are put back in the data files (tablespaces/dbspaces) on disk to get
> > back to a state of physical consistency and (2) individual transactions
> > in the logical log since the last checkpoint are rolled forward / rolled
> > back to get to a point of logical consistency.
>
> PostgreSQL combines the two, though there has been recent discussion
> about changing that. There may be some improvements in this regard in
> 8.3 (I don't remember if the patches were accepted or not).
>
> > Even with full_page_writes set to false and checkpoints taking place
> > every 60 seconds or so, the amount of WAL log data generated per minute
> > seems to be significant.
>
> Wow, do you really want to be checkpointing every 60 seconds? That's
> going to greatly increase your WAL volume, as well as the size of WAL.
>
> > So my question is this: If I kick off a transaction that loads records
> > with a size of 100 bytes, does the insert for that record take 100 bytes
> > in the WAL file, or is the data archived in the WAL log in page size
> > (8k) portions? So with piggyback commits if I can only stack up 2K worth
> > of data before the next LogFlush, will 2K be written to the WAL file, or
> > will 8K be written each time regardless of the amount of actual
> > transaction data that is available to flush?
>
> (Generally) WAL only records differences. The exception is that the
> first time a page is modified after a checkpoint, the entire page is
> written out to WAL.
>
> > Since there is no separate physical log to keep track of dirty/modified
> > pages since the last checkpoint I would assume that the WAL log is
> > serving a dual purpose of being able to get back to the point of
> > physical and logical database consistency, but I need to know for
> > certain that there is not a way to reduce the amount of WAL data being
> > written for the amount of transaction information we are actually
> > writing to the database at any given point in time.
>
> The only way to do it right now is to reduce the frequency of your
> checkpoints. IIRC you can't actually disable full page writes if you're
> using PITR.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jim C. Nasby 2007-05-16 16:51:43 Re: WAL file utilization question
Previous Message Alvaro Herrera 2007-05-16 16:35:20 Re: WAL file utilization question