Re: LWLock deadlock and gdb advice

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LWLock deadlock and gdb advice
Date: 2015-07-29 17:23:24
Message-ID: 55B90C0C.6020104@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/29/2015 04:10 PM, Andres Freund wrote:
> On 2015-07-29 14:22:23 +0200, Andres Freund wrote:
>> On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
>>> Ah, ok, that should work, as long as you also re-check the variable's value
>>> after queueing. Want to write the patch, or should I?
>>
>> I'll try. Shouldn't be too hard.
>
> What do you think about something roughly like the attached?

Hmm. Imagine this:

Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting
on it. The lock holder releases the lock, and wakes up A. But before A
wakes up and sees that the lock is free, another backend acquires the
lock again. It runs LWLockAcquireWithVar to the point just before
setting the variable's value. Now A wakes up, sees that the lock is
still (or again) held, and that the variable's value still matches the
old one, and goes back to sleep. The new lock holder won't wake it up
until it updates the value again, or to releases the lock.

I think that's OK given the current usage of this in WALInsertLocks, but
it's scary. At the very least you need comments to explain that race
condition. You didn't like the new LW_FLAG_VAR_SET flag and the API
changes I proposed? That eliminates the race condition.

Either way, there is a race condition that if the new lock holder sets
the variable to the *same* value as before, the old waiter won't
necessarily wake up even though the lock was free or had a different
value in between. But that's easier to explain and understand than the
fact that the value set by LWLockAcquireWithVar() might not be seen by a
waiter, until you release the lock or update it again.

Another idea is to have LWLockAcquire check the wait queue after setting
the variable, and wake up anyone who's already queued up. You could just
call LWLockUpdateVar() from LWLockAcquireCommon to set the variable. I
would prefer the approach from my previous patch more, though. That
would avoid having to acquire the spinlock in LWLockAcquire altogether,
and I actually like the API change from that independently from any
correctness issues.

The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK
in principle. You'll want to change LWLockConflictsWithVar() so that the
spinlock is held only over the "value = *valptr" line, though; the other
stuff just modifies local variables and don't need to be protected by
the spinlock. Also, when you enter LWLockWaitForVar(), you're checking
if the lock is held twice in a row; first at the top of the function,
and again inside LWLockConflictsWithVar. You could just remove the
"quick test" at the top.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-07-29 17:42:14 Re: upgrade failure from 9.5 to head
Previous Message Robert Haas 2015-07-29 16:54:59 Re: Reduce ProcArrayLock contention