Quick Links

Re: spinlock contention

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: spinlock contention
Date:	2011-07-12 13:03:10
Message-ID:	98ACBA06-8F4D-43BE-A653-9907CD836EC4@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Jul7, 2011, at 03:35 , Robert Haas wrote:
> Some poking around suggests that the problem isn't that
> spinlocks are routinely contended - it seems that we nearly always get
> the spinlock right off the bat. I'm wondering if the problem may be
> not so much that we have continuous spinlock contention, but rather
> than every once in a while a process gets time-sliced out while it
> holds a spinlock. If it's an important spinlock (like the one
> protecting SInvalReadLock), the system will quickly evolve into a
> state where every single processor is doing nothing but trying to get
> that spinlock. Even after the hapless lock-holder gets to run again
> and lets go of the lock, you have a whole pile of other backends who
> are sitting there firing of lock xchgb in a tight loop, and they can
> only get it one at a time, so you have ferocious cache line contention
> until the backlog clears.

Pondering this some more, I came up with another idea, pretty much
orthogonal to the shared counter partitioning I posted a patch for.

If indeed that problem isn't spin lock contention, but rather losing
control of the CPU while holding a spin lock, then why not try to get
rid of the spin lock entirely? On Linux, that's easy - this is exactly
what futexes are about. But it occurred to me that kernel support isn't
actually needed to do that - futexes can effectively be emulated in
user-space using just a semaphore, and without doing a syscall except
when there's contention.

The algorithm is quite straight forward, if one assumes a lock-free
implementation of a queue (More on that below)

LockAcquire:
(1) CAS the lock state to increment the reader count or set
the exclusive bit in a loop while the lock looks free.
If successful, we're done, otherwise we continue with (2)
(2) Add ourself to the wait queue
[ Since we've added ourself to the queue, we *must* now
decrement the semaphore no matter what, to keep the
increment/decrement calls balanced. We're careful to
maintain that invariant below. ]
(3) Fence (AKA issue full memory barrier)
(4) Re-check if the lock still looks taken. If it does,
we decrement the semaphore (PGSemaphoreLock), and
(upon wake-up) restart at (1).
Otherwise, continue with (5)
(5) Remove the first waiter from the queue and increment
her semaphore. Rinse-and-repeat until we either removed
ourself from the queue or the queue is empty.
(6) Decrement our semaphore.
[ (6) is necessary in the general case, see the remark
below (2). But we can of course detect the case were
we'd increment our own semaphore in (5) only to
decrement it again in (6), and skip both operations ]

LockRelease:
(1) Set the lock state to 0, i.e. release the lock.
(2) Fence (AKA issue full memory barrier)
(3) If the lock still looks free, remove the first waiter from
the queue and increment her semaphore. Rinse-and-repeat
while the lock looks free and the queue is non-empty.
[ From a correctness POV, we only have to wake up one
waiter here, and that only if there isn't one who was been
woken up but hasn't yet retried to take the lock. In reality,
the decision if and how many waiter to wake up would depend
on their desired lock level, and some variant of what we
currently call releaseOK. ]

The fencing steps (LockAcquire:3) and (LockRelease:2) guarantee
that if we block in LockAcquire() a lock holder will see our
queue entry and thus will wake us up eventually.

Because we use a semaphore and not, say, a simple signal, we don't
have to worry about the precise ordering of block and unblock
operations - we just need to ensure they're balanced.

Now to to enqueue() and dequeue() primitives that the algorithm
above depends on. There are multiple published algorithms
for lock-free queues. Some googling turned up

"An optimistic approach to lock-free FIFO queues",
E.L. Mozes, N. Shavit,
DOI 10.1007/s00446-007-0050-0

and

"CAS-Based Lock-Free Algorithm for Shared Deques",
M.M. Michael

The second one looks promising, since it only requires a single
CAS to enqueue and dequeue entries in the common case. Thus, it
should be just as efficient as our current spin-lock-based queue
in the uncontended case (and much better in the contested case, of
course).

[ Our case is, in fact, simpler than the generic setting that these
algorithms support. We only ever enqueue our *own* proc array entry
(so allocation and re-use of queue entries isn't much of an issue),
and always *process* the queue after enqueuing something - either
directly in LockAcquire or later in LockRelease. We thus don't really
need to support concurrent removal of queue entries, but might get
away with simply skipping queue processing if we detect that somebody
else is in the process of doing so. I think I have an idea for a
simpler lock-less queue that fits our needs, but I haven't yet
ironed out all the details (As it turns out, removing the very last
entry from the queue is the hard case). ]

Thoughts, comments and criticism are all very welcome!

best regards,
Florian Pflug

In response to

Re: spinlock contention at 2011-07-07 01:35:03 from Robert Haas

Responses

Re: spinlock contention at 2011-07-12 22:10:44 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Magnus Hagander	2011-07-12 13:34:56	Re: [COMMITTERS] pgsql: Enable CHECK constraints to be declared NOT VALID
Previous Message	Robert Haas	2011-07-12 12:55:57	Re: reducing the overhead of frequent table locks, v4