Quick Links

Re: our buffer replacement strategy is kind of lame

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: our buffer replacement strategy is kind of lame
Date:	2012-01-20 14:29:28
Message-ID:	4F197A48.20606@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 03.01.2012 17:56, Simon Riggs wrote:
> On Tue, Jan 3, 2012 at 3:18 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>
>>> 2. When a backend can't find a free buffer, it spins for a long time
>>> while holding the lock. This makes the buffer strategy O(N) in its
>>> worst case, which slows everything down. Notably, while this is
>>> happening the bgwriter sits doing nothing at all, right at the moment
>>> when it is most needed. The Clock algorithm is an approximation of an
>>> LRU, so is already suboptimal as a "perfect cache". Tweaking to avoid
>>> worst case behaviour makes sense. How much to tweak? Well,...
>>
>> I generally agree with this analysis, but I don't think the proposed
>> patch is going to solve the problem. It may have some merit as a way
>> of limiting the worst case behavior. For example, if every shared
>> buffer has a reference count of 5, the first buffer allocation that
>> misses is going to have a lot of work to do before it can actually
>> come up with a victim. But I don't think it's going to provide good
>> scaling in general. Even if the background writer only spins through,
>> on average, ten or fifteen buffers before finding one to evict, that
>> still means we're acquiring ten or fifteen spinlocks while holding
>> BufFreelistLock. I don't currently have the measurements to prove
>> that's too expensive, but I bet it is.
>
> I think its worth reducing the cost of scanning, but that has little
> to do with solving the O(N) problem. I think we need both.
>
> I've left the way open for you to redesign freelist management in many
> ways. Please take the opportunity and go for it, though we must
> realise that major overhauls require significantly more testing to
> prove their worth. Reducing spinlocking only sounds like a good way to
> proceed for this release.
>
> If you don't have time in 9.2, then these two small patches are worth
> having. The bgwriter locking patch needs less formal evidence to show
> its worth. We simply don't need to have a bgwriter that just sits
> waiting doing nothing.

I'd like to see some benchmarks that show a benefit from these patches,
before committing something like this that complicates the code. These
patches are fairly small, but nevertheless. Once we have a test case, we
can argue whether the benefit we're seeing is worth the extra code, or
if there's some better way to achieve it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Re: our buffer replacement strategy is kind of lame at 2012-01-03 15:56:12 from Simon Riggs

Responses

Re: our buffer replacement strategy is kind of lame at 2012-01-20 14:59:41 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Magnus Hagander	2012-01-20 14:34:11	Re: pg_basebackup option for handling symlinks
Previous Message	Dimitri Fontaine	2012-01-20 14:28:40	Re: Command Triggers