Quick Links

Re: Bug: Buffer cache is not scan resistant

From:	Jim Nasby <decibel(at)decibel(dot)org>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Bug: Buffer cache is not scan resistant
Date:	2007-03-06 04:11:14
Message-ID:	1D354595-03AE-49D7-980D-9A87988416E4@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mar 5, 2007, at 11:46 AM, Josh Berkus wrote:
> Tom,
>
>> I seem to recall that we've previously discussed the idea of
>> letting the
>> clock sweep decrement the usage_count before testing for 0, so that a
>> buffer could be reused on the first sweep after it was initially
>> used,
>> but that we rejected it as being a bad idea. But at least with large
>> shared_buffers it doesn't sound like such a bad idea.
>
> We did discuss an number of formulas for setting buffers with
> different
> clock-sweep numbers, including ones with higher usage_count for
> indexes and
> starting numbers of 0 for large seq scans as well as vacuums.
> However, we
> didn't have any way to prove that any of these complex algorithms
> would
> result in higher performance, so went with the simplest formula,
> with the
> idea of tinkering with it when we had more data. So maybe now's
> the time.
>
> Note, though, that the current algorithm is working very, very well
> for OLTP
> benchmarks, so we'd want to be careful not to gain performance in
> one area at
> the expense of another. In TPCE testing, we've been able to increase
> shared_buffers to 10GB with beneficial performance effect (numbers
> posted
> when I have them) and even found that "taking over RAM" with the
> shared_buffers (ala Oracle) gave us equivalent performance to using
> the FS
> cache. (yes, this means with a little I/O management engineering
> we could
> contemplate discarding use of the FS cache for a net performance
> gain. Maybe
> for 8.4)

An idea I've been thinking about would be to have the bgwriter or
some other background process actually try and keep the free list
populated, so that backends needing to grab a page would be much more
likely to find one there (and not have to wait to scan through the
entire buffer pool, perhaps multiple times).

My thought is to keep track of how many page requests occurred during
a given interval, and use that value (probably averaged over time) to
determine how many pages we'd like to see on the free list. The
background process would then run through the buffers decrementing
usage counts until it found enough for the free list. Before putting
a buffer on the 'free list', it would write the buffer out; I'm not
sure if it would make sense to de-associate the buffer with whatever
it had been storing or not, though. If we don't do that, that would
mean that we could pull pages back off the free list if we wanted to.
That would be helpful if the background process got a bit over-zealous.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

In response to

Re: Bug: Buffer cache is not scan resistant at 2007-03-05 18:46:16 from Josh Berkus

Responses

Re: Bug: Buffer cache is not scan resistant at 2007-03-06 07:17:46 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-03-06 04:15:39	Re: proposal: custom variables management
Previous Message	Jim Nasby	2007-03-06 04:02:27	Re: Bug: Buffer cache is not scan resistant