Re: our buffer replacement strategy is kind of lame

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: our buffer replacement strategy is kind of lame
Date: 2012-01-20 14:29:28
Message-ID: 4F197A48.20606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03.01.2012 17:56, Simon Riggs wrote:
> On Tue, Jan 3, 2012 at 3:18 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>
>>> 2. When a backend can't find a free buffer, it spins for a long time
>>> while holding the lock. This makes the buffer strategy O(N) in its
>>> worst case, which slows everything down. Notably, while this is
>>> happening the bgwriter sits doing nothing at all, right at the moment
>>> when it is most needed. The Clock algorithm is an approximation of an
>>> LRU, so is already suboptimal as a "perfect cache". Tweaking to avoid
>>> worst case behaviour makes sense. How much to tweak? Well,...
>>
>> I generally agree with this analysis, but I don't think the proposed
>> patch is going to solve the problem. It may have some merit as a way
>> of limiting the worst case behavior. For example, if every shared
>> buffer has a reference count of 5, the first buffer allocation that
>> misses is going to have a lot of work to do before it can actually
>> come up with a victim. But I don't think it's going to provide good
>> scaling in general. Even if the background writer only spins through,
>> on average, ten or fifteen buffers before finding one to evict, that
>> still means we're acquiring ten or fifteen spinlocks while holding
>> BufFreelistLock. I don't currently have the measurements to prove
>> that's too expensive, but I bet it is.
>
> I think its worth reducing the cost of scanning, but that has little
> to do with solving the O(N) problem. I think we need both.
>
> I've left the way open for you to redesign freelist management in many
> ways. Please take the opportunity and go for it, though we must
> realise that major overhauls require significantly more testing to
> prove their worth. Reducing spinlocking only sounds like a good way to
> proceed for this release.
>
> If you don't have time in 9.2, then these two small patches are worth
> having. The bgwriter locking patch needs less formal evidence to show
> its worth. We simply don't need to have a bgwriter that just sits
> waiting doing nothing.

I'd like to see some benchmarks that show a benefit from these patches,
before committing something like this that complicates the code. These
patches are fairly small, but nevertheless. Once we have a test case, we
can argue whether the benefit we're seeing is worth the extra code, or
if there's some better way to achieve it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2012-01-20 14:34:11 Re: pg_basebackup option for handling symlinks
Previous Message Dimitri Fontaine 2012-01-20 14:28:40 Re: Command Triggers