On Tue, Feb 1, 2011 at 12:58 PM, Kevin Grittner
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I also think Bruce's idea of calling fsync() on each relation just
>> *before* we start writing the pages from that relation might have
>> some merit.
> What bothers me about that is that you may have a lot of the same
> dirty pages in the OS cache as the PostgreSQL cache, and you've just
> ensured that the OS will write those *twice*. I'm pretty sure that
> the reason the aggressive background writer settings we use have not
> caused any noticeable increase in OS disk writes is that many
> PostgreSQL writes of the same buffer keep an OS buffer page from
> becoming stale enough to get flushed until PostgreSQL writes to it
> taper off. Calling fsync() right before doing "one last push" of
> the data could be really pessimal for some workloads.
I was thinking about what Greg reported here:
If the amount of pre-checkpoint dirty data is 3GB and the checkpoint
is writing 250MB, then you shouldn't have all that many extra
writes... but you might have some, and that might be enough to send
the whole thing down the tubes.
InnoDB apparently handles this problem by advancing the redo pointer
in small steps instead of in large jumps. AIUI, in addition to
tracking the LSN of each page, they also track the first-dirtied LSN.
That lets you checkpoint to an arbitrary LSN by flushing just the
pages with an older first-dirtied LSN. So instead of doing a
checkpoint every hour, you might do a mini-checkpoint every 10
minutes. Since the mini-checkpoints each need to flush less data,
they should be less disruptive than a full checkpoint. But that, too,
will generate some extra writes. Basically, any idea that involves
calling fsync() more often is going to tend to smooth out the I/O load
at the cost of some increase in the total number of writes.
If we don't want any increase at all in the number of writes,
spreading out the fsync() calls is pretty much the only other option.
I'm worried that even with good tuning that won't be enough to tamp
down the latency spikes. But maybe it will be...
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2011-02-01 18:33:56|
|Subject: Re: log_hostname and pg_stat_activity|
|Previous:||From: Bruce Momjian||Date: 2011-02-01 18:32:22|
|Subject: Re: Spread checkpoint sync|