From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: LWLock deadlock and gdb advice |
Date: | 2015-07-31 10:41:23 |
Message-ID: | 55BB50D3.9000702@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 07/30/2015 09:14 PM, Andres Freund wrote:
> On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote:
>> In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held,
>> until it has updated the variable. And LWLockWaitForVar() holds the spinlock
>> when it checks that the lock is held and that the variable's value matches.
>> So it cannot happen on 9.4.
>
> The first paragraph talked about "the same value", but that was just
> referring to it not yet having been cleared i think...
>
>> To reiterate, with 9.5, it's possible that a backend is sleeping in
>> LWLockWaitForVar(oldvar=123), even though the lock is currently held by
>> another backend with value 124. That seems wrong, or surprising at the very
>> least.
>
> With my patch that can't really happen that way though? The value is
> re-checked after queuing. If it has changed by then we're done. And if
> it hasn't yet changed we're guaranteed to be woken up once it's being
> changed?
Ok, let me try explaining it again. With your patch,
LWLockAcquireWithVar looks like this (simplified, assuming the lock is
free):
1. Set LW_VAL_EXCLUSIVE
3. (Acquire spinlock) Set *valptr = val (Release spinlock)
LWLockWaitForVar looks like this:
1. Check if lock is free
2. (Acquire Spinlock) Read *valptr, compare with oldval (Release spinlock)
3. LWLockQueueSelf
4. Re-check that *valptr is still equal to oldval
5. sleep.
This is the race condition:
Backend A Backend B
--------- ---------
LWLockAcquireWithVar(123)
Set LW_VAL_EXCLUSIVE
Set *valptr = 123
LWLockWaitForVar(123)
See that the lock is held
Read *valptr, it matches
LWLockQueueSelf
LWLockRelease()
LWLockAcquireWithVar(124)
Set LW_VAL_EXCLUSIVE
wakes up
See that the lock is still/again held
Read *valptr, it's still 123
LWLockQueueSelf
Re-check that *valptr is still 123.
go to sleep.
Set *valptr = 124
Now, Backend B's LWLockWaitForVar(123) call is sleeping, even though the
lock was free'd and reacquired with different value, 124. It won't wake
up until A updates the value or releases the lock again.
This was not possible in 9.4. It was possible in 9.4 too when the value
was same in both LWLockAcquireWithVar() calls, and I think that's
acceptable, but not when the values differ.
> I generaly don't mind adding some sort of flag clearing or such, but I'd
> rather not have it in the retry loop in the general LWLockAttemptLock
> path - I found that very small amounts of instructions in there have a
> measurable impact.
I doubt clearing a bit would matter there, although you might have a
better instinct on that...
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Jeremy Harris | 2015-07-31 11:21:54 | Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort" |
Previous Message | Ashutosh Bapat | 2015-07-31 10:33:10 | Re: Transactions involving multiple postgres foreign servers |