Re: Background LRU Writer/free list

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Background LRU Writer/free list
Date: 2007-04-19 04:10:45
Message-ID: Pine.GSO.4.64.0704182304290.7075@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 18 Apr 2007, Gregory Stark wrote:

> In particular I'm worried about what happens on a very busy cpu-bound
> system where adjusting the sleep times would result in it deciding to
> not sleep at all. On such a system sleeping for even 10ms might be too
> long... Anyways, if we have a working patch that works the other way
> around we could experiment with that and see if there are actual
> situations where sleeping for 0ms is necessary.

I've been waiting for 8.3 to settle down before packaging the prototype
auto-tuning background writer concept I'm working on (you can peek at the
code at http://www.westnet.com/~gsmith/content/postgresql/bufmgr.c ),
which already implements some of the ideas you're talking about in your
messages today. I estimate how much of the buffer pool is dirty, use that
to compute an expected I/O rate, and try to adjust parameters to meet a
quality of service guarantee for how often the entire buffer pool is
scanned. This is one of those problems that gets more difficult the more
you dig into it; with all that done I still feel like I'm only halfway
finished and several parts worked radically different in reality than I
expected them to.

If you're allowing the background writer to write 1000 pages at a clip,
that's 8MB each interval. Doing that every 200ms makes for an I/O rate of
40MB/s. In a system that cares about data integrity, you'll exceed the
ability of the WAL to sustain page writes (which limits how fast you can
dirty pages) long before the interval approaches 0ms. What I do in my
code is set the interval to 200ms, compute what the maximum pages to write
must be, and if it's >1000 then I reduce the interval. I've tested
dumping into a fairly fast disk array with tons of cache and I've never
been able to get useful throughput below an 80ms interval; the OS just
clamps down and makes you wait for I/O instead regardless of how little
you intended to sleep. Eventually, it's got to hit disk, and you can only
buffer for so long before that starts to slow you down.

Anyway, this is a tangent discussion. The LRU patch that's in the queue
doesn't really care if it runs with a short interval or a long one,
because it automatically scales how much work it does according to how
much time passed. I think that many only be a bit of tweaking away from a
solid solution. Tuning the all scan, which is what you're talking about
when you speak in terms of the statistics about the overall buffer pool,
is a much harder job.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2007-04-19 06:22:32 Re: Remaining VACUUM patches
Previous Message ITAGAKI Takahiro 2007-04-19 04:09:48 Load distributed checkpoint V4