Re: Wait free LW_SHARED acquisition

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Wait free LW_SHARED acquisition
Date: 2013-09-27 21:39:47
Message-ID: 20130927213947.GC9819@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-09-27 14:46:50 +0200, Florian Pflug wrote:
> On Sep27, 2013, at 00:55 , Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > So the goal is to have LWLockAcquire(LW_SHARED) never block unless
> > somebody else holds an exclusive lock. To produce enough appetite for
> > reading the rest of the long mail, here's some numbers on a
> > pgbench -j 90 -c 90 -T 60 -S (-i -s 10) on a 4xE5-4620
> >
> > master + padding: tps = 146904.451764
> > master + padding + lwlock: tps = 590445.927065
> >
> > That's rougly 400%.
>
> Interesting. I played with pretty much the same idea two years or so ago.
> At the time, I compared a few different LWLock implementations. Those
> were AFAIR
>
> A) Vanilla LWLocks
> B) A + an atomic-increment fast path, very similar to your proposal
> C) B but with a partitioned atomic-increment counter to further
> reduce cache-line contention
> D) A with the spinlock-based queue replaced by a lockless queue
>
> At the time, the improvements seemed to be negligible - they looked great
> in synthetic benchmarks of just the locking code, but didn't translate
> to improved TPS numbers. Though I think the only version that ever got
> tested on more than a handful of cores was C…

I think you really need multi-socket systems to see the big benefits
from this. My laptop barely shows any improvements, while my older 2
socket workstation already shows some in workloads that have more
contention than pgbench -S.

From a quick look, you didn't have any sleeping queueing in at least one
of the variants in there? In my tests, that was tremendously important
to improve scaling if there was any contention. Which is not surprising
in the end, because otherwise you essentially have rw-spinlocks which
really aren't suitable for many of the lwlocks we use.

Getting the queueing semantics, including releaseOK, right was what took
me a good amount of time, the atomic ops part was pretty quick...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-27 21:53:59 Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
Previous Message Andres Freund 2013-09-27 21:18:24 Re: logical changeset generation v6