Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Richard Yen <richyen(at)iparadigms(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [Slony1-general] WAL partition overloaded--by autovacuum?
Date: 2010-07-09 23:23:53
Message-ID: 4C37AF89.1070103@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Richard Yen wrote:
> I figured that pg_xlog and data/base could both be on the FusionIO drive, since there would be no latency when there are no spindles.
>

(Rolls eyes) Please be careful about how much SSD Kool-Aid you drink,
and be skeptical of vendor claims. They don't just make latency go away,
particularly on heavy write workloads where the technology is at its
weakest.

Also, random note, I'm seeing way too many FusionIO drive setups where
people don't have any redundancy to cope with a drive failure, because
the individual drives are so expensive they don't have more than one.
Make sure that if you lose one of the drives, you won't have a massive
data loss. Replication might help with that, if you can stand a little
bit of data loss when the SSD dies. Not if--when. Even if you have a
good one they don't last forever.

> This means my pg_xlog partition should be (2 + checkpoint_completion_target) * checkpoint_segments + 1 = 41 files, or 656MB. Then, if there are more than 49 files, unneeded segment files will be deleted, but in this case all segment files are needed, so they never got deleted. Perhaps we should add in the docs that pg_xlog should be the size of the DB or larger?
>

Excessive write volume beyond the capacity of the hardware can end up
delaying the normal checkpoint that would have cleaned up all the xlog
files. There's a nasty spiral that can get into I've seen a couple of
times in similar form to what you reported. The pg_xlog should never
exceed the size computed by that formula for very long, but it can burst
above its normal size limits for a little bit. This is already mentioned
as possibility in the manual: "If, due to a short-term peak of log
output rate, there are more than 3 * checkpoint_segments + 1 segment
files, the unneeded segment files will be deleted instead of recycled
until the system gets back under this limit." Autovacuum is an easy way
to get the sort of activity needed to cause this problem, but I don't
know if it's a necessary component to see the problem. You have to be in
an unusual situation before the sum of the xlog files is anywhere close
to the size of the database though.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeremy Palmer 2010-07-10 00:25:55 Re: Index usage with functions in where condition
Previous Message Tom Lane 2010-07-09 23:19:50 Re: Index usage with functions in where condition