Re: Load distributed checkpoint

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Load distributed checkpoint
Date: 2006-12-08 03:33:16
Message-ID: Pine.GSO.4.64.0612072205250.24653@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Thu, 7 Dec 2006, Kevin Grittner wrote:

> Between the xfs caching and the batter backed cache in the RAID...

Mmmmm, battered cache. You can deep fry anything nowadays.

> Would the background writer be disabled during this extended checkpoint?

The background writer is the same process that does the full buffer sweep
at checkpoint time. You wouldn't have to disable it because it would be
busy doing this extended checkpoint instead of its normal job.

> How is it better to concentrate step 2 in an extended checkpoint
> periodically rather than consistently in the background writer?

Right now, when the checkpoint flush is occuring, there is no background
writer active--that process is handling the checkpoint. Itagaki's
suggestion is basically to take the current checkpoint code, which runs
all in one burst, and spread it out over time. I like the concept, as
I've seen the behavior he's describing (even after tuning the background
writer like you suggest and doing Linux disk tuning as Ron describes), but
I think solving the problem is a little harder than suggested.

I have two concerns with the logic behind this approach. The first is
that if the background writer isn't keeping up with writing out all the
dirty pages, what makes you think that running the checkpoint with a
similar level of activity is going to? If your checkpoint is taking a
long time, it's because the background writer has an overwhelming load and
needs to be bailed out. Slowing down the writes with a lazier checkpoint
process introduces the possibility that you'll hit a second checkpoint
request before you're even finished cleaning up the first one, and then
you're really in trouble.

Second, the assumption here is that it's writing the dirty buffers out
that is the primary cause of the ugly slowdown. I too believe it could
just as easily be the fsync when it's done that killing you, and slowing
down the writes isn't necessarily going to make that faster.

> Doesn't the file system caching logic combined with a battery backed
> cache in the controller cover this, or is your patch to help out those
> who don't have battery backed controller cache?

Unless your shared buffer pool is so small that you can write it all out
onto the cache, that won't help much with this problem.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2006-12-08 03:59:36 Re: Load distributed checkpoint
Previous Message zhang Jackie 2006-12-08 02:39:08 about PostgreSQL Benchmak( pgbench )

Browse pgsql-patches by date

  From Date Subject
Next Message Greg Smith 2006-12-08 03:59:36 Re: Load distributed checkpoint
Previous Message Neil Conway 2006-12-08 02:15:25 Re: ShowStats