On 01/26/2011 10:04 AM, Anne Rosset wrote:
> We've been able to match long running database queries to such
> processes. This occurs under relatively low load average (say 4 out of
> 8) and can involve as little as 1 single sql query.
The b state means the process is blocking, waiting for... something. One
thing you need to consider is far more than your CPU usage. If you have
the 'sar' utility, run it as 'sar 1 100' just to see how your system is
working. What you want to watch for is iowait.
If even one query is churning your disks, every single other query that
has to take even one block from disk instead of cache, is going to
stall. If you see an iowait of anything greater than 5%, you'll want to
check further on the device that contains your database with iostat. My
favorite use of this is 'iostat -dmx [device] 1' where [device] is the
block device where your data files are, if your WAL is somewhere else.
And yeah, your shared_buffers are kinda on the lowish side. Your
effective_cache_size is good, but you have a lot more room to increase
Worse however, is your checkpoints. Lord. Increase checkpoint_segments
to *at least* 20, and increase your checkpoint_completion_target to 0.7
or 0.8. Check your logs for checkpoint warnings, and I'll bet it's
constantly complaining about increasing your checkpoint segments. Every
checkpoint not started by the scheduled system risks a checkpoint spike,
which can flood your system with IO regardless of which queries are
running. That kind of IO storm will ruin your performance, and with only
3 checkpoint segments on a busy database, are probably happening constantly.
Unfortunately we still need to know more. This is just based on your PG
settings, and that's not really enough to know how "busy" your DB is.
One way to check is to log the contents of pg_stat_database, especially
the xact_commit and xact_rollback columns. Grab those with a timestamp.
If you get a snapshot of that every minute, you can figure out how many
queries you're processing per minute or per second pretty easily. We've
hit 8600 TPS before and don't have nearly the trouble you've been reporting.
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
for terms and conditions related to this email
In response to
pgsql-performance by date
|Next:||From: David Greco||Date: 2011-01-26 19:31:58|
|Subject: Real vs Int performance|
|Previous:||From: Anne Rosset||Date: 2011-01-26 16:04:41|
|Subject: FW: Queries becoming slow under heavy load|