Re: Lockless queue of waiters in LWLock

From: Andres Freund <andres(at)anarazel(dot)de>
To: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Lockless queue of waiters in LWLock
Date: 2022-10-31 23:15:17
Message-ID: 20221031231517.wvkcj533c4rnm5zq@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for working on this - I think it's something we need to improve.

On 2022-10-31 14:38:23 +0400, Pavel Borisov wrote:
> If the lock state contains references to the queue head and tail, we can
> implement a lockless queue of waiters for the LWLock. Adding new items to
> the queue head or tail can be done with a single CAS operation (adding to
> the tail will also require further setting the reference from the previous
> tail). Given that there could be only one lock releaser, which wakes up
> waiters in the queue, we can handle all the concurrency issues with
> reasonable complexity.

Right now lock releases happen *after* the lock is released. I suspect that is
at least part of the reason for the regression you're seeing. It also looks
like you're going to need a substantially higher number of atomic operations -
right now the queue processing needs O(1) atomic ops, but your approach ends
up with O(waiters) atomic ops.

I suspect it might be worth going halfway, i.e. put the list head/tail in the
atomic variable, but process the list with a lock held, after the lock is
released.

> Removing the queue spinlock bit and the corresponding contention should
> give the performance gain on high concurrent LWLock usage scenarios.

> Currently, the maximum size of procarray is 2^18 (as stated in
> buf_internals.h), so with the use of the 64-bit LWLock state variable, we
> can store the procarray indexes for both head and tail of the queue, all
> the existing machinery, and even have some spare bits for future flags.

Another advantage: It shrinks the LWLock struct, which makes it cheaper to
make locks more granular...

> I can not understand why the performance of a lockless queue patch has a
> minor regression in the region of around 20 connections, even when compared
> to the current master branch.

I suspect that's the point where the additional atomic operations hurt the
most, while not yet providing a substantial gain.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-10-31 23:36:16 Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
Previous Message Jonathan S. Katz 2022-10-31 22:52:20 Re: User functions for building SCRAM secrets