Re: Clock sweep not caching enough B-Tree leaf pages?

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Date: 2014-04-16 13:35:33
Message-ID: 20140416133533.GH17874@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-04-16 08:25:23 -0500, Merlin Moncure wrote:
> The downside of this approach was complexity and difficult to test for
> edge case complexity. I would like to point out though that while i/o
> efficiency gains are nice, I think contention issues are the bigger
> fish to fry.

That's my feeling as well.

>
> On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote:
> >> What about: 9. Don't wait on locked buffer in the clock sweep.
> >
> > I don't think we do that? Or are you referring to locked buffer headers?
>
> Right -- exactly. I posted patch for this a while back. It's quite
> trivial: implement a trylock variant of the buffer header lock macro
> and further guard the check with a non-locking test (which TAS()
> already does generally, but the idea is to avoid the cache line lock
> in likely cases of contention). I believe this to be unambiguously
> better: even if it's self healing or unlikely, there is no good reason
> to jump into a spinlock fray or even request a contented cache line
> while holding a critical lock.

IIRC you had problems proving the benefits of that, right?

I think that's because the locking times of buffer headers are short
enough that it's really unlikely to read a locked buffer header
spinlock. The spinlock acquiration will have made the locker the
exclusive owner of the spinlock in the majority of cases, and as soon as
that happens the cache miss/transfer will take far longer than the lock
takes.

I think this is the wrong level to optimize things. Imo there's two
possible solutions (that don't exclude each other):

* perform the clock sweep in one process so there's a very fast way to
get to a free buffer. Possibly in a partitioned way.

* Don't take a global exclusive lock while performing the clock
sweep. Instead increase StrategyControl->nextVictimBuffer in chunks
under an exclusive lock, and then scan the potential victim buffers in
those chunks without a global lock held.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-04-16 13:48:56 Re: [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of < 0
Previous Message Petr Jelinek 2014-04-16 13:35:01 Re: bgworker crashed or not?