Re: 2nd Level Buffer Cache

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, rsmogura <rsmogura(at)softperience(dot)eu>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 2nd Level Buffer Cache
Date: 2011-03-25 16:26:38
Message-ID: AANLkTi=A2sa=cTd998JkXYaCg0m7JGirEvqcnm8V2K2_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 24, 2011 at 7:51 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Thu, Mar 24, 2011 at 11:33 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> I tried under the circumstances I thought were mostly likely to show a
>> time difference, and I was unable to detect a reliable difference in
>> timing between free list and clock sweep.
>
> It strikes me that it shouldn't be terribly hard to add a profiling
> option to Postgres to dump out a list of precisely which blocks of
> data were accessed in which order. Then it's fairly straightforward to
> process that list using different algorithms to measure which
> generates the fewest cache misses.

It is pretty easy to get the list by adding a couple "elog". To be
safe you probably also need to record pins and unpins, as you can't
evict a pinned buffer no matter how other-wise eligible it might be.
For most workloads you might be able to get away with just assuming
that if it is eligible for replacement under any reasonable strategy,
than it is very unlikely to still be pinned. Also, if the list is
derived from a concurrent environment, then the order of access you
see under a particular policy might no longer be the same if a
different policy were adopted.

But whose work-load would you use to do the testing? The ones I was
testing were simple enough that I just know what the access pattern
is, the root and 1st level branch blocks are almost always in shared
buffer, the leaf and table blocks almost never are.

Here my concern was not how to choose which block to replace in a
conceptual way, but rather how to code that selection in way that is
fast and concurrent and low latency for the latency-sensitive
processes. Either method will evict the same blocks, with the
exception of differences introduced by race conditions that get
resolved differently.

A benefit of focusing on the implementation rather than the high level
selection strategy is that improvements in implementation are more
likely to better carry over to other workloads.

My high level conclusions were that the running of the selection is
generally not a bottleneck, and in the cases where it was, the
bottleneck was due to contention on the LWLock, regardless of what was
done under that lock. Changing who does the clock-sweep is probably
not meaningful unless it facilitates a lock-strength reduction or
other contention reduction.

I have also played with simulations of different algorithms for
managing the usage_count, and I could get improvements but they
weren't big enough or general enough to be very exciting. It was
generally the case were if the data size was X, the improvement was
maybe 30% over the current, but if the data size was <0.8X or >1.2X,
there was no difference. So not very general.

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vaibhav Kaushal 2011-03-25 16:36:16 When and how many times does ExecSetParamPlan executes?
Previous Message Alvaro Herrera 2011-03-25 16:26:14 Re: How to Make a pg_filedump