Re: s_lock() seems too aggressive for machines with many sockets

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jan Wieck <jan(at)wi3ck(dot)info>
Cc: Nils Goroll <slink(at)schokola(dot)de>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: s_lock() seems too aggressive for machines with many sockets
Date: 2015-06-10 15:34:44
Message-ID: 20150610153444.GF10551@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-06-10 11:12:46 -0400, Jan Wieck wrote:
> The test case is that 200 threads are running in a tight loop like this:
>
> for (...)
> {
> s_lock();
> // do something with a global variable
> s_unlock();
> }
>
> That is the most contended case I can think of, yet the short and
> predictable code while holding the lock is the intended use case for a
> spinlock.

But lots of our code isn't that predictable. Check e.g. the buffer
spinlock case:

static void
UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
{
PrivateRefCountEntry *ref;

/* not moving as we're likely deleting it soon anyway */
ref = GetPrivateRefCountEntry(buf->buf_id + 1, false);
Assert(ref != NULL);

if (fixOwner)
ResourceOwnerForgetBuffer(CurrentResourceOwner,
BufferDescriptorGetBuffer(buf));

Assert(ref->refcount > 0);
ref->refcount--;
if (ref->refcount == 0)
{
/* I'd better not still hold any locks on the buffer */
Assert(!LWLockHeldByMe(buf->content_lock));
Assert(!LWLockHeldByMe(buf->io_in_progress_lock));

LockBufHdr(buf);

/* Decrement the shared reference count */
Assert(buf->refcount > 0);
buf->refcount--;

/* Support LockBufferForCleanup() */
if ((buf->flags & BM_PIN_COUNT_WAITER) &&
buf->refcount == 1)
{
/* we just released the last pin other than the waiter's */
int wait_backend_pid = buf->wait_backend_pid;

buf->flags &= ~BM_PIN_COUNT_WAITER;
UnlockBufHdr(buf);
ProcSendSignal(wait_backend_pid);
}
else
UnlockBufHdr(buf);

ForgetPrivateRefCountEntry(ref);
}
}

If you check the section where the spinlock is held there's nontrivial
code executed. Under contention you'll have the problem that if backend
tries to acquire the the spinlock while another backend holds the lock,
it'll "steal" the cacheline on which the lock and the rest of the buffer
descriptor resides. Which means that operations like buf->refcount--,the
if (), and the &= will now have to wait till the cacheline is
transferred back (as the cacheline will directly be marked as modified
on the attempted locker). All the while other backends will also try to
acquire the pin, causing the same kind of trouble.

If this'd instead be written as

ret = pg_atomic_fetch_sub_u32(&buf->state, 1);

if (ret & BM_PIN_COUNT_WAITER)
{
pg_atomic_fetch_sub_u32(&buf->state, BM_PIN_COUNT_WAITER);
/* XXX: deal with race that another backend has set BM_PIN_COUNT_WAITER */
}

the whole thing looks entirely different. While there's obviously still
cacheline bouncing, every operation is guaranteed to make progress (on
x86 at least, but even elsewhere we'll never block another locker).

Now unlock is the simpler case, but...

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-06-10 15:36:04 Re: replication slot restart_lsn initialization
Previous Message Nils Goroll 2015-06-10 15:30:33 Re: s_lock() seems too aggressive for machines with many sockets