Re: WAL partition filling up after high WAL activity

From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: WAL partition filling up after high WAL activity
Date: 2011-11-11 09:54:11
Message-ID: 4EBCF0C3.1040608@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/09/2011 05:06 PM, Greg Smith wrote:
> On 11/07/2011 05:18 PM, Richard Yen wrote:
>> My biggest question is: we know from the docs that there should be no
>> more than (2 + checkpoint_completion_target) * checkpoint_segments + 1
>> files. For us, that would mean no more than 48 files, which equates
>> to 384MB--far lower than the 9.7GB partition size. **Why would WAL
>> use up so much disk space?**
>>
>
> That's only true if things are operating normally. There are at least
> two ways this can fail to be a proper upper limit on space used:
>
> 1) You are archiving to a second system, and the archiving isn't keeping
> up. Things that haven't been archived can't be re-used, so more disk
> space is used.
>
> 2) Disk I/O is slow, and the checkpoint writes take a significant period
> of time. The internal scheduling assumes each individual write will
> happen without too much delay. That assumption can easily be untrue on
> a busy system. The worst I've seen now are checkpoints that take 6
> hours to sync, where the time is supposed to be a few seconds. Disk
> space in that case was a giant multiple of checkpoint_segments. (The
> source of that problem is very much improved in PostgreSQL 9.1)
>

Hello

We have a similar case in june but we did not find the cause of our
problem. More details and information:
http://archives.postgresql.org/pgsql-docs/2011-06/msg00007.php

Your explanation in 2) sounds like a good candidate for the problem we
had. As I said in june, I think we need to improve the documentation in
this area. A note in the documentation about what you have explained in
2) with maybe some hints about how to find out if this is happening will
be a great improvement.

We did not understand why we experienced this problem in june when
creating a GIN index on a tsvector column. But we found out that a lot
of the tsvector data was generated from "garbage" data (base64 encoding
of huge attachments). When we generated the right tsvector data, the
creation of the GIN index ran smoothly and the problem with extra WAL
files disappeared.

PS.- In our case, the disk space used by all the extra WAL files was
almost the equivalent to the 17GB of our GIN index.

regards,
- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk688LoACgkQBhuKQurGihTbvQCfaSBdYNF2oOtErcx/e4u0Zw1J
pLIAn2Ztdbuz33es2uw8ddSIjj8UXe3s
=olkD
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ruslan Zakirov 2011-11-11 15:01:41 avoiding seq scans when two columns are very correlated
Previous Message Jay Levitt 2011-11-10 15:42:53 Re: Subquery in a JOIN not getting restricted?