Re: our buffer replacement strategy is kind of lame

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: our buffer replacement strategy is kind of lame
Date: 2012-01-05 22:09:44
Message-ID: 4F061FA8.7000306@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/03/2012 06:22 PM, Jim Nasby wrote:
> On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
>
>> I think that our current freelist is practically useless, because it
>> is almost always empty, and the cases where it's not empty (startup,
>> and after a table or database drop) are so narrow that we don't really
>> get any benefit out of having it. However, I'm not opposed to the
>> idea of a freelist in general: I think that if we actually put in some
>> effort to keep the freelist in a non-empty state it would help a lot,
>> because backends would then have much less work to do at buffer
>> allocation time.
>>
> This is exactly what the FreeBSD VM system does (which is at least one of the places where the idea of a clock sweep for PG came from ages ago). There is a process that does nothing but attempt to keep X amount of memory on the free list, where it can immediately be grabbed by anything that needs memory. Pages on the freelist are guaranteed to be clean (as in not dirty), but not zero'd. In fact, IIRC if a page on the freelist gets referenced again it can be pulled back out of the free list and put back into an active state.
>
> The one downside I see to this is that we'd need some heuristic to determine how many buffers we want to keep on the free list.
>

http://wiki.postgresql.org/wiki/Todo#Background_Writer has "Consider
adding buffers the background writer finds reusable to the free list"
and "Automatically tune bgwriter_delay based on activity rather then
using a fixed interval", which both point to my 8.3 musing and other
suggestionss starting at
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php I
could write both those in an afternoon. The auto-tuning stuff already
in the background writer originally intended to tackle this issue, but
dropped it in lieu of shipping something simpler first. There's even a
prototype somewhere on an old drive here.

The first missing piece needed before this was useful was separating out
the background writer and checkpointer processes. Once I realized the
checkpoints were monopolizing so much time, especially when they hit bad
states, it was obvious the writer couldn't be relied upon for this job.
That's much better now since Simon's
806a2aee3791244bf0f916729bfdb5489936e068 "Split work of bgwriter between
2 processes: bgwriter and checkpointer", which just became available in
November to build on.

The second missing piece blocking this work in my mind was how exactly
we're going to benchmark the result, mainly to prove it doesn't hurt
some workloads. I haven't fully internalized the implications of
Robert's upthread comments, in terms of being able to construct a
benchmark stressing both the best and worst case situation here. That's
really the hardest part of this whole thing, by a lot. Recent spending
has brought me an 8 HyperThread core laptop that can also run DTrace, so
I expect to have better visibility into this soon too.

I think here in 2011 the idea of having a background writer process that
could potentially occupy most of a whole core doing work so backends
don't have to is an increasingly attractive one. So long as that comes
along with an auto-tuning delay, it shouldn't hurt the work toward
lowering power management either. Might even help really, by allowing
larger values of bgwriter_delay than you'd want to use during busy
periods. I was planning to mimic the sort of fast attack/slow delay
logic already used for the auto-tuned timing, so that you won't fall
behind by more than one bgwriter_delay worth of activity. Then it
should realize a burst is here and the writer has to start moving faster.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-01-05 22:19:54 Re: FATAL: bogus data in lock file "postmaster.pid": ""
Previous Message Andrew Dunstan 2012-01-05 22:06:54 Re: [COMMITTERS] pgsql: Work around perl bug in SvPVutf8().