On Aug 22, 2007, at 10:57 AM, Kenneth Marshall wrote:
> On Wed, Aug 22, 2007 at 07:33:35PM +0400, Dmitry Potapov wrote:
>> We run a large (~66Gb) web-backend database on Postgresql
>> 8.2.4 on
>> Linux. The hardware is Dual Xeon 5130 with 16Gb ram, LSI Megaraid
>> scsi controller w/512Mb writeback cache and a BBU. Storage setup
>> contains 3
>> raid10 arrays (data, xlog, indexes, each on different array), 12
>> HDDs total.
>> Frontend application uses jdbc driver, connection pooling and
>> We've run into an issue of IO storms on checkpoints. Once in
>> (which is checkpoint_interval) the database becomes unresponsive
>> for about
>> 4-8 seconds. Query processing is suspended, server does nothing
>> but writing
>> a large amount of data to disks. Because of the db server being
>> some of the web clients get timeout and disconnect, which is
>> Even worse, as the new requests come at a pretty constant rate, by
>> the time
>> this storm comes to an end there is a huge amount of sleeping app.
>> waiting for their queries to complete. After the db server comes
>> back to
>> life again, these threads wake up and flood it with queries, so
>> suffer even more, for some minutes after the checkpoint.
>> It seemed strange to me that our 70%-read db generates so much
>> pages that writing them out takes 4-8 seconds and grabs the full
>> First, I started to tune bgwriter to a more aggressive settings,
>> but this
>> was of no help, nearly no performance changes at all. Digging into
>> the issue
>> further, I discovered that linux page cache was the reason. "Dirty"
>> parameter in /proc/meminfo (which shows the amount of ready-to-
>> write "dirty"
>> data currently sitting in page cache) grows between checkpoints
>> from 0 to
>> about 100Mb. When checkpoint comes, all the 100mb got flushed out
>> to disk,
>> effectively causing a IO storm.
>> I found this (http://www.westnet.com/~gsmith/content/linux-
>> document and
>> peeked into mm/page-writeback.c in linux kernel source tree. I'm
>> not sure
>> that I understand pdflush writeout semantics correctly, but looks
>> like when
>> the amount of "dirty" data is less than dirty_background_ratio*RAM/
>> pdflush only writes pages in background, waking up every
>> dirty_writeback_centisecs and writing no more than 1024 pages
>> (MAX_WRITEBACK_PAGES constant). When we hit
>> dirty_background_ratio, pdflush
>> starts to write out more agressively.
>> So, looks like the following scenario takes place: postgresql
>> writes something to database and xlog files, dirty data gets to
>> the page
>> cache, and then slowly written out by pdflush. When postgres
>> generates more
>> dirty pages than pdflush writes out, the amount of dirty data in the
>> pagecache is growing. When we're at checkpoint, postgres does fsync
>> () on the
>> database files, and sleeps until the whole page cache is written out.
>> By default, dirty_background_ratio is 2%, which is about 328Mb
>> of 16Gb
>> total. Following the curring pdflush logic, nearly this amount of
>> data we
>> face to write out on checkpoint effective stalling everything
>> else, so even
>> 1% of 16Gb is too much. My setup experience 4-8 sec pause in
>> operation even
>> on ~100Mb dirty pagecache...
>> I temporaly solved this problem by setting
>> dirty_background_ratio to
>> 0%. This causes the dirty data to be written out immediately. It
>> is ok for
>> our setup (mostly because of large controller cache), but it
>> doesn't looks
>> to me as an elegant solution. Is there some other way to fix this
>> without disabling pagecache and the IO smoothing it was designed
>> to perform?
> You are working at the correct level. The bgwriter performs the I/O
> function at the database level. Obviously, the OS level smoothing
> needed to be tuned and you have done that within the parameters of
> the OS.
> You may want to bring this up on the Linux kernel lists and see if
> they have
> any ideas.
> Good luck,
Have you tried decreasing you checkpoint interval? That would at
least help to reduce the amount of data that needs to be flushed when
Software Developer | Emma®
800.595.4401 or 615.292.5888
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
In response to
pgsql-performance by date
|Next:||From: Pallav Kalva||Date: 2007-08-28 15:09:00|
|Subject: 8.2.4 Chooses Bad Query Plan|
|Previous:||From: Scott Marlowe||Date: 2007-08-28 14:32:01|
|Subject: Re: server performance issues - suggestions for tuning|