Re: our buffer replacement strategy is kind of lame

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: our buffer replacement strategy is kind of lame
Date: 2012-01-04 01:27:26
Message-ID: CA+TgmobYD_dDd1ipBJ0t9a99=3PfiYQJuXD2jeXrO7N_yjyq0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2012 at 6:22 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
>>> So you don't think a freelist is worth having, but you want a list of
>>> allocation targets.
>>> What is the practical difference?
>>
>> I think that our current freelist is practically useless, because it
>> is almost always empty, and the cases where it's not empty (startup,
>> and after a table or database drop) are so narrow that we don't really
>> get any benefit out of having it.  However, I'm not opposed to the
>> idea of a freelist in general: I think that if we actually put in some
>> effort to keep the freelist in a non-empty state it would help a lot,
>> because backends would then have much less work to do at buffer
>> allocation time.
>
> This is exactly what the FreeBSD VM system does (which is at least one of the places where the idea of a clock sweep for PG came from ages ago). There is a process that does nothing but attempt to keep X amount of memory on the free list, where it can immediately be grabbed by anything that needs memory. Pages on the freelist are guaranteed to be clean (as in not dirty), but not zero'd. In fact, IIRC if a page on the freelist gets referenced again it can be pulled back out of the free list and put back into an active state.
>
> The one downside I see to this is that we'd need some heuristic to determine how many buffers we want to keep on the free list.

Fortuitously, I believe the background writer already has most of the
necessary logic: it attempts to predict how many buffers are about to
be needed - I think based on a decaying average.

Actually, I think that logic could use some improvement, because I
believe I've heard Greg Smith comment that it's often necessary to
tune bgwriter_delay downward. It'd be nice to make the delay adaptive
somehow, to avoid the need for manual tuning (and unnecessary wake-ups
when the system goes idle).

But possibly the existing logic is good enough for a first cut.
However, in the interest of full disclosure, I'll admit that I've done
no testing in this area at all and am talking mostly out of my
posterior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-04 01:34:34 Re: Should I implement DROP INDEX CONCURRENTLY?
Previous Message Daniel Farina 2012-01-04 01:13:24 pg_internal.init and an index file have the same inode