Re: hanging for 30sec when checkpointing

From: gjm(at)caledoncard(dot)com (Greg Mennie)
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: hanging for 30sec when checkpointing
Date: 2004-02-11 14:25:53
Message-ID: a806dcd9.0402110625.3190f48c@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

me(at)shanewright(dot)co(dot)uk (Shane Wright) wrote in message news:<40202216(dot)4010608(at)shanewright(dot)co(dot)uk>...
> Hi,
>
> I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
> getting some weird performance at times.
>
> When the db is under medium-heavy load, it periodically spawns a
> 'checkpoint subprocess' which runs for between 15 seconds and a minute.
> Ok, fair enough, the only problem is the whole box becomes pretty much
> unresponsive during this time - from what I can gather it's because it
> writes out roughly 1Mb (vmstat says ~1034 blocks) per second until its done.
>
> Other processes can continue to run (e.g. vmstat) but other things do
> not (other queries, mostly running 'ps fax', etc). So everything gets
> stacked up till the checkpoint finishes and all is well again, untill
> the next time...

I am having a similar problem and this is what I've found so far:

During the checkpoint the volume of data that's written isn't very
high and it goes on for a fairly long time (up to 20 seconds) at a
rate that appears to be well below our disk array's potential. The
volume of data written is usually 1-5 MB/sec on an array that we've
tested to sustain over 50 MB/sec (sequential writes, of course).

It turns out that what's going on is that the command queue for the
RAID array (3Ware RAID card) is filling up during the checkpoint and
is staying at the max (254 commands) for most of the checkpoint. The
odd lucky insert appears to work, but is extremely slow. In our case,
the WAL files are on the same array as the data files, so everything
grinds to a halt.

The machine we're running it on is a dual processor box with 2GB RAM.
Since most database read operations are being satisfied from the
cache, reading processes don't seem to be affected during the pauses.

I suspect that increasing the checkpoint frequency could help, since
the burst of commands on the disk channel would be shorter. (it's
currently 300 seconds)

I have found that the checkpoint after a vacuum is the worst. This
was the original problem which led to the investigation.

Besides more frequent checkpoints, I am at a loss as to what to do
about this. Any help would be appreciated.

Thanks,

Greg

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Veera Sivakumar 2004-02-11 15:12:53 No space left on device
Previous Message Christopher Browne 2004-02-11 14:06:46 Re: constraints and performance