Re: Just-in-time Background Writer Patch+Test Results

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Just-in-time Background Writer Patch+Test Results
Date: 2007-09-08 20:26:15
Message-ID: Pine.GSO.4.64.0709081501480.2440@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 6 Sep 2007, Decibel! wrote:

> I don't know that there should be a direct correlation, but ISTM that
> scan_whole_pool_seconds should take checkpoint intervals into account
> somehow.

Any direct correlation is weak at this point. The LRU cleaner has a small
impact on checkpoints, in that it's writing out buffers that may make the
checkpoint quicker. But this particular write trickling mechanism is not
aimed directly at flushing the whole pool; it's more about smoothing out
idle periods a bit.

Also, computing the checkpoint interval is itself tricky. Heikki had to
put some work into getting something that took into account both the
timeout and segments mechanisms to gauge progress, and I'm not sure I can
directly re-use that because it's really only doing that while the
checkpoint is active. I'm not saying it's a bad idea to have the expected
interval as an input to the model, just that it's not obvious to me how to
do it and whether it would really help.

> I like the idea of not having that as a GUC, but I'm doubtful that it
> can be hard-coded like that. What if checkpoint_timeout is set to 120?
> Or 60? Or 2000?

Someone using 60 or 120 has checkpoint problems way bigger than the LRU
cleaner can be expected to help with. How fast the reusable buffers it
can write are pushed out is the least of their problems. Also, I'd expect
that the only cases using such a low value for a good reason are doing so
because they have enormous amounts of activity on their system, and in
that case the primary JIT mechanism should dominate how the LRU cleaner
treats them. scan_whole_pool_seconds doesn't do anything if the primary
mechanism was already planning to scan more buffers than it aims for.

Someone who has very infrequent checkpoints and therefore low activity,
like your 2000 case, can expect that the LRU cleaner will lap and catch up
to the strategy point about 2 minutes after any activity and then follow
directly behind it with the way I've set this up. If that's cleaning the
buffer cache too aggressively, I think those in that situation would be
better served by constraining the maxpages parameter; that's directly
adjusting what I'd expect their real issue is, how fast pages can flush to
disk, rather than the secondary one of how fast the pool is being scanned.

I picked 2 minutes for that value because it's as slow as I can make it
and still serve its purpose, while not feeling to me like it's too fast
for a relatively idle system even if someone set maxpages=1000.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2007-09-08 21:00:55 Re: WIP patch for latestCompletedXid method of computing snapshot xmax
Previous Message Tom Lane 2007-09-08 20:21:57 Re: WIP patch for latestCompletedXid method of computing snapshot xmax