Re: Lockless queue of waiters in LWLock

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Lockless queue of waiters in LWLock
Date: 2022-11-01 18:52:36
Message-ID: CALT9ZEF7q+Sarz1MjrX-fM7OsoU7CK16=ONoGCOkY3Efj+Grnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Andres,
Thank you very much for the ideas on this topic!

> > > If the lock state contains references to the queue head and tail, we can
> > > implement a lockless queue of waiters for the LWLock. Adding new items to
> > > the queue head or tail can be done with a single CAS operation (adding to
> > > the tail will also require further setting the reference from the previous
> > > tail). Given that there could be only one lock releaser, which wakes up
> > > waiters in the queue, we can handle all the concurrency issues with
> > > reasonable complexity.
> >
> > Right now lock releases happen *after* the lock is released.
>
> That makes sense. The patch makes the locker hold the lock, which it's
> processing the queue. So, the lock is held for a longer time.
>
> > I suspect that is
> > at least part of the reason for the regression you're seeing. It also looks
> > like you're going to need a substantially higher number of atomic operations -
> > right now the queue processing needs O(1) atomic ops, but your approach ends
> > up with O(waiters) atomic ops.
>
> Hmm... In the patch, queue processing calls CAS once after processing
> the queue. There could be retries to process the queue parts, which
> were added concurrently. But I doubt it ends up with O(waiters) atomic
> ops. Pavel, I think we could gather some statistics to check how many
> retries we have on average.
>

I've made some measurements on the number of repeated CAS operations
on lock acquire and release. (For this I applied
Print-lwlock-stats-on-CAS-repeats.patch onto the previous patch v1 in
this thread) The results when running the same insert test, that
produced the results in the original post on 20 connections are as
follows:
lwlock ProcArray:
-------------| locks acquired | CAS repeats to acquire | CAS repeats to release
shared | 493187 | 57310 | 12049
exclusive | 46816 | 42329 | 8816
wait-until-free | - | 0
| 76124
blk 79473

> > I suspect it might be worth going halfway, i.e. put the list head/tail in the
> > atomic variable, but process the list with a lock held, after the lock is
> > released.
>
> Good idea. We, anyway, only allow one locker at a time to process the queue.

Alexander added these changes in v2 of a patch. The results of the
same benchmarking master vs Andres' patch vs lockless queue patch v1
and v2 are attached. They are done in the same way as in the original
post. The small difference is that I've gone further until 5000
connections and also produced both log and linear scale connections
axis plots for the more clear demonstration.

Around 20 connections TPS increased though not yet to the same value
the master and Andres' patches has.
And in a range 300-3000 connections the v2 patch demonstrated clear gain.

I'm planning to gather more detailed statistics on different
LWLockAcquire calls soon to understand prospects for further
optimizations.

Best regards,
Pavel Borisov

Attachment Content-Type Size
txid-v1-v2-log.png image/png 139.4 KB
insert-v1-v2-log.png image/png 136.7 KB
txid-v1-v2.png image/png 146.5 KB
insert-v1-v2.png image/png 146.6 KB
Print-lwlock-stats-on-CAS-repeats.patch application/octet-stream 7.6 KB
v2-0001-Lockless-queue-of-LWLock-waiters.patch application/octet-stream 42.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari Mannsåker 2022-11-01 18:55:43 Re: perl 5.36, C99, -Wdeclaration-after-statement -Wshadow=compatible-local
Previous Message Tom Lane 2022-11-01 18:35:43 Re: pg15 inherited stats expressions: cache lookup failed for statistics object