Re: 2nd Level Buffer Cache

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 2nd Level Buffer Cache
Date: 2011-03-24 21:59:41
Message-ID: 201103242259.41631.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> Thursday 24 March 2011 22:41:19
> On Thu, Mar 24, 2011 at 5:34 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> > On Thu, Mar 24, 2011 at 8:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
wrote:
> >> It seems at least plausible that buffer allocation could be
> >> significantly faster if it need only pop the head of a list, rather
> >> than scanning until it finds a suitable candidate. Moving as much
> >> buffer allocation work as possible into the background seems like it
> >> ought to be useful.
> >
> > Linked lists are notoriously non-concurrent, that's the whole reason
> > for the clock sweep algorithm to exist at all instead of just using an
> > LRU directly. That said, an LRU needs to be able to remove elements
> > from the middle and not just enqueue elements on the tail, so the
> > situation isn't exactly equivalent.
> >
> > Just popping off the head is simple enough but the bgwriter would need
> > to be able to add elements to the tail of the list and the people
> > popping elements off the head would need to compete with it for the
> > lock on the list. And I think you need a single lock for the whole
> > list because of the cases where the list becomes a single element or
> > empty.
> >
> > The main impact this list would have is that it would presumably need
> > some real number of buffers to satisfy the pressure for victim buffers
> > for a real amount of time. That would represent a decrease in cache
> > size, effectively evicting buffers from cache as if the cache were
> > smaller by that amount.
> >
> > Theoretical results are that a small change in cache size affects
> > cache hit rates substantially. I'm not sure that's born out by
> > practical experience with Postgres though. People tend to either be
> > doing mostly i/o or very little i/o. Cache hit rate only really
> > matters and is likely to be affected by small changes in cache size in
> > the space in between
>
> You wouldn't really have to reduce the effective cache size - there's
> logic in there to just skip to the next buffer if the first one you
> pull off the freelist has been reused. But the cache hit rates on
> those buffers would (you'd hope) be fairly low, since they are the
> ones we're about to reuse. Maybe it doesn't work out to a win,
> though.
If I may,
Under unnormal circumstances (like current process is "held" by kernel)
obtaining buffer from list may be cheaper
this code
while (StrategyControl->firstFreeBuffer >= 0)
{
buf = &BufferDescriptors[StrategyControl->firstFreeBuffer];
Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);

/* Unconditionally remove buffer from freelist */
StrategyControl->firstFreeBuffer = buf->freeNext;
buf->freeNext = FREENEXT_NOT_IN_LIST;
could look
do
{
SpinLock();
if (StrategyControl->firstFreeBuffer >= 0) {
Unspin()
break;
}

buf = &BufferDescriptors[StrategyControl->firstFreeBuffer];
Unspin();

Assert(buf->freeNext != FREENEXT_NOT_IN_LIST);

/* Unconditionally remove buffer from freelist */
StrategyControl->firstFreeBuffer = buf->freeNext;
buf->freeNext = FREENEXT_NOT_IN_LIST;like this
}while(true);
and aquirng spin lock for linked list is enaugh, and cheaper then taking
lwlock is more complex than spin on this.

after this simmilary with spin lock on
trycounter = NBuffers*4;
for (;;)
{
spinlock()
buf = &BufferDescriptors[StrategyControl->nextVictimBuffer];

if (++StrategyControl->nextVictimBuffer >= NBuffers)
{
StrategyControl->nextVictimBuffer = 0;
StrategyControl->completePasses++;
}
unspin();

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-03-24 22:05:12 Re: crash-safe visibility map, take four
Previous Message Stephen Frost 2011-03-24 21:55:47 Re: Pre-set Hint bits/VACUUM FREEZE on data load..?