Re: Load distributed checkpoint

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Load distributed checkpoint
Date: 2006-12-08 16:43:27
Message-ID: 6439.1165596207@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> "Jim C. Nasby" <jim(at)nasby(dot)net> wrote:
>> Generally, I try and configure the all* settings so that you'll get 1
>> clock-sweep per checkpoint_timeout. It's worked pretty well, but I don't
>> have any actual tests to back that methodology up.

> We got to these numbers somewhat scientifically. I studied I/O
> patterns under production load and figured we should be able to handle
> about 800 writes in per 200 ms without causing problems. I have to
> admit that I based the percentages and the ratio between "all" and "lru"
> on gut feel after musing over the documentation.

I like Kevin's settings better than what Jim suggests. If the bgwriter
only makes one sweep between checkpoints then it's hardly going to make
any impact at all on the number of dirty buffers the checkpoint will
have to write. The point of the bgwriter is to reduce the checkpoint
I/O spike by doing writes between checkpoints, and to have any
meaningful impact on that, you'll need it to make the cycle several times.

Another point here is that you want checkpoints to be pretty far apart
to minimize the WAL load from full-page images. So again, a bgwriter
that's only making one loop per checkpoint is not gonna be doing much.

I wonder whether it would be feasible to teach the bgwriter to get more
aggressive as the time for the next checkpoint approaches? Writes
issued early in the interval have a much higher probability of being
wasted (because the page gets re-dirtied later). But maybe that just
reduces to what Takahiro-san already suggested, namely that
checkpoint-time writes should be done with the same kind of scheduling
the bgwriter uses outside checkpoints. We still have the problem that
the real I/O storm is triggered by fsync() not write(), and we don't
have a way to spread out the consequences of fsync().

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jorge alberto 2006-12-08 16:53:23 #define GEVHDRSZ ( offsetof(GistEntryVector, vector[0]) ) explanation please
Previous Message Tom Lane 2006-12-08 15:42:06 Re: EXPLAIN ANALYZE

Browse pgsql-patches by date

  From Date Subject
Next Message Kevin Grittner 2006-12-08 17:01:47 Re: Load distributed checkpoint
Previous Message Kevin Grittner 2006-12-08 15:26:27 Re: Load distributed checkpoint