Quick Links

Re: Fix performance degradation of contended LWLock on NUMA

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Fix performance degradation of contended LWLock on NUMA
Date:	2017-10-18 23:28:01
Message-ID:	20171018232801.r2idhipqm36hdfww@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2017-06-05 16:22:58 +0300, Sokolov Yura wrote:
> Patch changes the way LWLock is acquired.
>
> Before patch, lock is acquired using following procedure:
> 1. first LWLock->state is checked for ability to take a lock, and CAS
> performed (either lock could be acquired or not). And it is retried if
> status were changed.
> 2. if lock is not acquired at first 1, Proc is put into wait list, using
> LW_FLAG_LOCKED bit of LWLock->state as a spin-lock for wait list.
> In fact, profile shows that LWLockWaitListLock spends a lot of CPU on
> contendend LWLock (upto 20%).
> Additional CAS loop is inside pg_atomic_fetch_or_u32 for setting
> LW_FLAG_HAS_WAITERS. And releasing wait list lock is another CAS loop
> on the same LWLock->state.
> 3. after that state is checked again, because lock could be released
> before Proc were queued into wait list.
> 4. if lock were acquired at step 3, Proc should be dequeued from wait
> list (another spin lock and CAS loop). And this operation could be
> quite heavy, because whole list should be traversed.
>
> Patch makes lock acquiring in single CAS loop:
> 1. LWLock->state is read, and ability for lock acquiring is detected.
> If there is possibility to take a lock, CAS tried.
> If CAS were successful, lock is aquired (same to original version).
> 2. but if lock is currently held by other backend, we check ability for
> taking WaitList lock. If wait list lock is not help by anyone, CAS
> perfomed for taking WaitList lock and set LW_FLAG_HAS_WAITERS at once.
> If CAS were successful, then LWLock were still held at the moment wait
> list lock were held - this proves correctness of new algorithm. And
> Proc is queued to wait list then.
> 3. Otherwise spin_delay is performed, and loop returns to step 1.

Right, something like this didn't use to be possible because we'd both
the LWLock->state and LWLock->mutex. But after 008608b9d5106 that's not
a concern anymore.

> Algorithm for LWLockWaitForVar is also refactored. New version is:
> 1. If lock is not held by anyone, it immediately exit.
> 2. Otherwise it is checked for ability to take WaitList lock, because
> variable is protected with it. If so, CAS is performed, and if it is
> successful, loop breaked to step 4.
> 3. Otherwise spin_delay perfomed, and loop returns to step 1.
> 4. var is checked for change.
> If it were changed, we unlock wait list lock and exit.
> Note: it could not change in following steps because we are holding
> wait list lock.
> 5. Otherwise CAS on setting necessary flags is performed. If it succeed,
> then queue Proc to wait list and exit.
> 6. If Case failed, then there is possibility for LWLock to be already
> released - if so then we should unlock wait list and exit.
> 7. Otherwise loop returns to step 5.
>
> So invariant is preserved:
> - either we found LWLock free,
> - or we found changed variable,
> - or we set flags and queue self while LWLock were held.
>
> Spin_delay is not performed at step 7, because we should release wait
> list lock as soon as possible.

That seems unconvincing - by not delaying you're more likely to
*increase* the time till the current locker that holds the lock can
release the lock.

> Additionally, taking wait list lock is skipped in both algorithms if
> SpinDelayStatus.spins is less than part of spins_per_delay loop
> iterations (so, it is initial iterations, and some iterations after
> pg_usleep, because SpinDelayStatus.spins is reset after sleep).

I quite strongly think it's a good idea to change this at the same time
as the other changes you're proposing here. I think it's a worthwhile
thing to pursue, but that change also has quit ethe potential for
regressing in some cases.

- Andres

In response to

Fix performance degradation of contended LWLock on NUMA at 2017-06-05 13:22:58 from Sokolov Yura

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Badrul Chowdhury	2017-10-18 23:35:45	Re: Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)
Previous Message	Michael Paquier	2017-10-18 23:08:13	Re: 64-bit queryId?