Re: our buffer replacement strategy is kind of lame

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: our buffer replacement strategy is kind of lame
Date: 2011-08-12 12:26:14
Message-ID: CA+TgmoaNQ4G_7ciLFMC4Ei=1ao-Ps=p-Ej0QFByjhiXmOmi5hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 12, 2011 at 4:36 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Aug 12, 2011 at 5:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> On
>> the other hand, the buffer manager has *no problem at all* trashing
>> the buffer arena if we're faulting in pages for an index scan rather
>> than a sequential scan.  If you manage to get all of sample_data into
>> memory (by running many copies of the above query in parallel, you can
>> get each one to allocate its own ring buffer, and eventually pull in
>> all the pages), and then run some query that probes an index which is
>> too large to fit in shared_buffers, it cheerfully blows the whole
>> sample_data table out without a second thought.  Had you sequentially
>> scanned a big table, of course, there would be some protection, but an
>> index scan can stomp all over everything with complete impunity.
>
> That's a good observation and I think we should do this
>
> * Make an IndexScan use a ring buffer once it has used 32 blocks. The
> vast majority won't do that, so we avoid overhead on the common path.
>
> * Make an BitmapIndexScan use a ring buffer when we know that the
> index is larger than 32 blocks. (Ignore upper parts of tree for that
> calc).

We'd need to think about what happens to the internal pages of the
btree, the leaf pages, and then the heap pages from the underlying
relation; those probably shouldn't all be treated the same. Also, I
think the tricky part is figuring out when to apply the optimization
in the first place. Once we decide we need a ring buffer, a very
small one (like 32 blocks) is probably the way to go. But it will be
a loser to apply the optimization to data sets that would otherwise
have fit in shared_buffers. This is a classic case of the LRU/MRU
problem. You want to evict buffers in LRU fashion until the working
set gets larger than you can cache; and then you want to switch to MRU
to avoid uselessly caching pages that you'll never manage to revisit
before they're evicted.

The point of algorithms like Clock-Pro is to try to have the system
work that out on the fly, based on the actual workload, rather than
using heuristics. I agree with you there's no convincing evidence
that Clock-Pro would be better for us; I mostly thought it was
interesting because it seems that the NetBSD and Linux guys find it
interesting, and they're managing much larger caches than the ones
we're dealing with.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-08-12 12:28:49 Re: our buffer replacement strategy is kind of lame
Previous Message Robert Haas 2011-08-12 12:14:05 Re: our buffer replacement strategy is kind of lame