Skip site navigation (1) Skip section navigation (2)

Re: our buffer replacement strategy is kind of lame

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: our buffer replacement strategy is kind of lame
Date: 2012-01-05 22:09:44
Message-ID: 4F061FA8.7000306@2ndQuadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 01/03/2012 06:22 PM, Jim Nasby wrote:
> On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
>    
>> I think that our current freelist is practically useless, because it
>> is almost always empty, and the cases where it's not empty (startup,
>> and after a table or database drop) are so narrow that we don't really
>> get any benefit out of having it.  However, I'm not opposed to the
>> idea of a freelist in general: I think that if we actually put in some
>> effort to keep the freelist in a non-empty state it would help a lot,
>> because backends would then have much less work to do at buffer
>> allocation time.
>>      
> This is exactly what the FreeBSD VM system does (which is at least one of the places where the idea of a clock sweep for PG came from ages ago). There is a process that does nothing but attempt to keep X amount of memory on the free list, where it can immediately be grabbed by anything that needs memory. Pages on the freelist are guaranteed to be clean (as in not dirty), but not zero'd. In fact, IIRC if a page on the freelist gets referenced again it can be pulled back out of the free list and put back into an active state.
>
> The one downside I see to this is that we'd need some heuristic to determine how many buffers we want to keep on the free list.
>    

http://wiki.postgresql.org/wiki/Todo#Background_Writer has "Consider 
adding buffers the background writer finds reusable to the free list" 
and "Automatically tune bgwriter_delay based on activity rather then 
using a fixed interval", which both point to my 8.3 musing and other 
suggestionss starting at 
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php  I 
could write both those in an afternoon.  The auto-tuning stuff already 
in the background writer originally intended to tackle this issue, but 
dropped it in lieu of shipping something simpler first.  There's even a 
prototype somewhere on an old drive here.

The first missing piece needed before this was useful was separating out 
the background writer and checkpointer processes.  Once I realized the 
checkpoints were monopolizing so much time, especially when they hit bad 
states, it was obvious the writer couldn't be relied upon for this job.  
That's much better now since Simon's 
806a2aee3791244bf0f916729bfdb5489936e068 "Split work of bgwriter between 
2 processes: bgwriter and checkpointer", which just became available in 
November to build on.

The second missing piece blocking this work in my mind was how exactly 
we're going to benchmark the result, mainly to prove it doesn't hurt 
some workloads.  I haven't fully internalized the implications of 
Robert's upthread comments, in terms of being able to construct a 
benchmark stressing both the best and worst case situation here.  That's 
really the hardest part of this whole thing, by a lot.  Recent spending 
has brought me an 8 HyperThread core laptop that can also run DTrace, so 
I expect to have better visibility into this soon too.

I think here in 2011 the idea of having a background writer process that 
could potentially occupy most of a whole core doing work so backends 
don't have to is an increasingly attractive one.  So long as that comes 
along with an auto-tuning delay, it shouldn't hurt the work toward 
lowering power management either.  Might even help really, by allowing 
larger values of bgwriter_delay than you'd want to use during busy 
periods.  I was planning to mimic the sort of fast attack/slow delay 
logic already used for the auto-tuned timing, so that you won't fall 
behind by more than one bgwriter_delay worth of activity.  Then it 
should realize a burst is here and the writer has to start moving faster.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2012-01-05 22:19:54
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Previous:From: Andrew DunstanDate: 2012-01-05 22:06:54
Subject: Re: [COMMITTERS] pgsql: Work around perl bug in SvPVutf8().

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group