Re: Wait free LW_SHARED acquisition - v0.9

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Wait free LW_SHARED acquisition - v0.9
Date: 2014-10-14 18:36:57
Message-ID: CAHyXU0zeAkZyyoAgBePnA8L469gFV6+-XYX+GGsQ=CaegZL9qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 14, 2014 at 8:58 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-10-14 08:40:49 -0500, Merlin Moncure wrote:
>> On Fri, Oct 10, 2014 at 11:00 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > Which is nearly trivial now that atomics are in. Check out the attached
>> > WIP patch which eliminates the spinlock from StrategyGetBuffer() unless
>> > there's buffers on the freelist.
>>
>> Is this safe?
>>
>> + victim = pg_atomic_fetch_add_u32(&StrategyControl->nextVictimBuffer, 1);
>>
>> - if (++StrategyControl->nextVictimBuffer >= NBuffers)
>> + buf = &BufferDescriptors[victim % NBuffers];
>> +
>> + if (victim % NBuffers == 0)
>> <snip>
>>
>> I don't think there's any guarantee you could keep nextVictimBuffer
>> from wandering off the end.
>
> Not sure what you mean? It'll cycle around at 2^32. The code doesn't try
> to avoid nextVictimBuffer from going above NBuffers. To avoid overrunning
> &BufferDescriptors I'm doing % NBuffers.
>
> Yes, that'll have the disadvantage of the first buffers being slightly
> more likely to be hit, but for that to be relevant you'd need
> unrealistically large amounts of shared_buffers.

Right -- my mistake. That's clever. I think that should work well.

> I think that's pretty much orthogonal. I *do* think it makes sense to
> increment nextVictimBuffer in bigger steps. But the above doesn't
> prohibit doing so - and it'll still be be much better without the
> spinlock. Same number of atomics, but no potential of spinning and no
> potential of being put to sleep while holding the spinlock.
>
> This callsite has a comparatively large likelihood of being put to sleep
> while holding a spinlock because it can run for a very long time doing
> nothing but spinlocking.

A while back, I submitted a minor tweak to the clock sweep so that,
instead of spinlocking every single buffer header as it swept it just
did a single TAS as a kind of a trylock and punted to the next buffer
if the test failed on the principle there's not good reason to hang
around. You only spin if you passed the first test; that should
reduce the likelihood of actual spinning to approximately zero. I
still maintain there's no reason not to do that (I couldn't show a
benefit but that was because mapping list locking was masking any
clock sweep contention at that time).

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-10-14 19:09:50 Re: pgaudit - an auditing extension for PostgreSQL
Previous Message Alvaro Herrera 2014-10-14 18:21:20 Re: pg_dump refactor patch to remove global variables