Re: our buffer replacement strategy is kind of lame

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: our buffer replacement strategy is kind of lame
Date: 2012-01-03 15:56:12
Message-ID: CA+U5nMJKKTr2J2e8mBm7JWmacwyzrMotW=-OyjZORZVSRCZe-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2012 at 3:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> The clock sweep is where all the time goes, in its current form.
>
> ...but I agree with this.  In its current form, the clock sweep has to
> acquire a spinlock for every buffer it touches.  That's really
> expensive, and I think we need to either get rid of it altogether or
> at least find some way to move it into the background.  Ideally, in
> the common case, a backend that wants to allocate a buffer shouldn't
> need to do more than acquire a spinlock, pop a buffer off some sort of
> linked list of allocation targets, and release the spinlock.  Whatever
> other computation is needed should get pushed out of foreground
> processing.

So you don't think a freelist is worth having, but you want a list of
allocation targets.
What is the practical difference?

>> We have 2 problems
>>
>> 1. StrategySyncStart() requests exclusive lock on BufFreelistLock, so
>> bgwriter has to fight with backends to find out where it should clean.
>> As a result it spends lots of time waiting and insufficient time
>> cleaning.
>
> I have trouble accepting that this is really the problem we should be
> solving.  There's only one background writer process, and that process
> is waking up every 200ms by default.  Many people probably tune that
> down quite a bit, but unless I'm quite mistaken, it should still be a
> drop in the bucket next to what the backends are doing.

That doesn't follow. Forcing the bgwriter to wait makes no sense. Yes,
most of the contention is caused by the other backends, but the
bgwriter experiences that contention currently when it has no need to
do so.

Presumably if you did have an allocation list maintained in the
background by the bgwriter you wouldn't expect the bgwriter to wait
behind a lock for no reason when it could be doing useful work.

>> 2. When a backend can't find a free buffer, it spins for a long time
>> while holding the lock. This makes the buffer strategy O(N) in its
>> worst case, which slows everything down. Notably, while this is
>> happening the bgwriter sits doing nothing at all, right at the moment
>> when it is most needed. The Clock algorithm is an approximation of an
>> LRU, so is already suboptimal as a "perfect cache". Tweaking to avoid
>> worst case behaviour makes sense. How much to tweak? Well,...
>
> I generally agree with this analysis, but I don't think the proposed
> patch is going to solve the problem.  It may have some merit as a way
> of limiting the worst case behavior.  For example, if every shared
> buffer has a reference count of 5, the first buffer allocation that
> misses is going to have a lot of work to do before it can actually
> come up with a victim.  But I don't think it's going to provide good
> scaling in general.  Even if the background writer only spins through,
> on average, ten or fifteen buffers before finding one to evict, that
> still means we're acquiring ten or fifteen spinlocks while holding
> BufFreelistLock. I don't currently have the measurements to prove
> that's too expensive, but I bet it is.

I think its worth reducing the cost of scanning, but that has little
to do with solving the O(N) problem. I think we need both.

I've left the way open for you to redesign freelist management in many
ways. Please take the opportunity and go for it, though we must
realise that major overhauls require significantly more testing to
prove their worth. Reducing spinlocking only sounds like a good way to
proceed for this release.

If you don't have time in 9.2, then these two small patches are worth
having. The bgwriter locking patch needs less formal evidence to show
its worth. We simply don't need to have a bgwriter that just sits
waiting doing nothing.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-03 15:59:12 Re: patch: ALTER TABLE IF EXISTS
Previous Message Pavel Stehule 2012-01-03 15:38:55 Re: patch: ALTER TABLE IF EXISTS