From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Lockless queue of waiters in LWLock |
Date: | 2022-11-04 19:07:41 |
Message-ID: | 20221104190741.gsuiybes3hula74m@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-11-03 14:50:11 +0400, Pavel Borisov wrote:
> Or maybe there is another explanation for now small performance
> difference around 20 connections described in [0]?
> Thoughts?
Using xadd is quite a bit cheaper than cmpxchg, and now every lock release
uses a compare-exchange, I think.
In the past I had a more complicated version of LWLockAcquire which tried to
use an xadd to acquire locks. IIRC (and this is long enough ago that I might
not) that proved to be a benefit, but I was worried about the complexity. And
just getting in the version that didn't always use a spinlock was the higher
priority.
The use of cmpxchg vs lock inc/lock add/xadd is one of the major reasons why
lwlocks are slower than a spinlock (but obviously are better under contention
nonetheless).
I have a benchmark program that starts a thread for each physical core and
just increments a counter on an atomic value.
On my dual Xeon Gold 5215 workstation:
cmpxchg:
32: throughput per thread: 0.55M/s, total: 11.02M/s
64: throughput per thread: 0.63M/s, total: 12.68M/s
lock add:
32: throughput per thread: 2.10M/s, total: 41.98M/s
64: throughput per thread: 2.12M/s, total: 42.40M/s
xadd:
32: throughput per thread: 2.10M/s, total: 41.91M/s
64: throughput per thread: 2.04M/s, total: 40.71M/s
and even when there's no contention, every thread just updating its own
cacheline:
cmpxchg:
32: throughput per thread: 88.83M/s, total: 1776.51M/s
64: throughput per thread: 96.46M/s, total: 1929.11M/s
lock add:
32: throughput per thread: 166.07M/s, total: 3321.31M/s
64: throughput per thread: 165.86M/s, total: 3317.22M/s
add (no lock):
32: throughput per thread: 530.78M/s, total: 10615.62M/s
64: throughput per thread: 531.22M/s, total: 10624.35M/s
xadd:
32: throughput per thread: 165.88M/s, total: 3317.51M/s
64: throughput per thread: 165.93M/s, total: 3318.53M/s
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Nikolay Shaplov | 2022-11-04 19:30:06 | Re: [PATCH] New [relation] option engine |
Previous Message | Justin Pryzby | 2022-11-04 18:38:38 | Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL |