Re: Reducing spinlock acquisition within clock sweep loop

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing spinlock acquisition within clock sweep loop
Date: 2015-04-21 21:30:39
Message-ID: CAHyXU0zygsV41vYkz7h7Y3Adirhg3URiyCTckRLq8ym10arUsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 21, 2015 at 3:25 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2015-04-21 14:46:04 -0500, Merlin Moncure wrote:
>> The main sources of contention, buffer_strategy_lock, has been removed
>> FWICT via 5d7962c6 and d72731a7. However, during the sweep each tick
>> locks the buffer header via spinlock in order to to adjust
>> usage_count.
>
> FWIW, I think the best approach to achieve this is to make most buffer
> header manipulations lockless. It's not trivial but quite possible. I'd
> a very rough POC a while back ([1]); that only degenerated to something
> spinning for a couple edge cases (!BM_VALID || BM_PIN_COUNT_WAITER
> IIRC). That'd not necessarily reduce the number of atomic ops, but
> making things lockless makes does away with the problem of sleeping
> while holding a spinlock etc.

Right. lockless doesn't necessarily translate to better performance
though if you have to increase atomic ops to get away with it.

> The primary use case for that is less the clock sweep than things index
> root pages and small lookup pages. I've seen several times that the
> spinlock can really hurt there.

Yeah. I'm trying to address that basically...see below.

>> Clock sweep buffer acquisition currently looks like this:
>> loop:
>> tick();
>> spinlock();
>> in use? goto loop
>> usage_count > 0? yes: decrement, goto loop
>> return buffer;
>>
>> Proposed buffer acquisition would look like:
>> loop:
>> tick();
>> victim < last_victim? yes: atomic re-fetch completePasses (see below)
>> usage_count > completePasses? yes: goto loop
>> try_spinlock(); no lock? goto loop
>> usage_count > completePasses? yes: goto loop
>> in use? goto loop
>> return buffer;
>>
>> (try_spinlock is an adorned TAS() as a opposed to a full spin). Usage
>> count is double checked after lock acquisition in case the local copy
>> is stale.
>
> I don't really see what this improves? You could do the "try spinlock"
> bit just as well with the old algorithm and I don't really see a
> accuracy benefit in the proposed approach? Maybe I'm missing something
> entirely?

Yes: the 'try lock' isn't necessary here, i just threw it in. It's
been suggested few times in the past...I see no useful reason why the
clocksweep (using the current algorithm) should sit and spin.
Contended buffers should be hopped over. The real win is in
eliminating the spinlock during clock tick except for buffers that are
real candidates for being returned. refcount is a different problem,
but has different characteristics; it has to be 100% accurate while
usage_count does not, something which can be exploited (previously I
suggested to simply remove all locking around it but I'm discarding
that idea as there's no way to really prove it works well in unusual
circumstances).

> [1]: The basic trick I have in mind is that by putting flags, usagecount
> and refcount into one integer (in separate bits) it is possible to do
> CAS loops to manipulate each of those. E.g. to decrement usagecount you
> can do something roughly like
>
> do
> {
> val = pg_atomic_read_u32(desc->flag_and_counts);
> if (BufferDescUsageCount(val) == 0)
> break;
>
> newval = BufferDescChangeUsageCount(val, -1);
> }
> while(!pg_atomic_compare_exchange_u32(&desc->flag_and_counts,
> &val, newval))
>
> Where BufferDescUsageCount just to some bit masking and shifting to
> manipulate the right part. Same for refcount and most flags.
>
> To deal with changing tags I'd introduced a flag value that was used to
> say 'you have to spinlock this time'.

Interesting. I'm not sure sure that these ideas are contradictory,
especially since refcount and usage_count are not always manipulated
at the same time.

Maybe you can get the best of both worlds; lockless pin/unpin with a
lock free sweep: you'd have a fancy 'BufferDescIncreaseUsageCount',
but not a decrease, since the completePasses (or some derivative of
it), increases to give you the 'decrease' which is implicit as the
clock sweeps around. What I'm proposing would decrease the atomic ops
from the current 'atomic increment + spinlock' to a simple
'increment', plus the unusual atomic read to re-fetch completePasses.

OTOH, one tricky bit I didn't mention is that inferring usage_count
from an always incrementing value and completePasses would require the
'increment' routine to have knowledge of completePasses where
previously it doesn't need it to support the 'max 5' check, requiring
an atomic read, although perhaps not on every pin; a local copy would
mostly do.

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-04-21 21:36:41 Re: Shouldn't CREATE TABLE LIKE copy the relhasoids property?
Previous Message Dean Rasheed 2015-04-21 21:21:53 Re: Row security violation error is misleading