Re: LWLock deadlock and gdb advice

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki <hlinnaka(at)iki(dot)fi>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LWLock deadlock and gdb advice
Date: 2015-07-19 18:49:14
Message-ID: CAMkU=1x265eWfhXanB1ZXphAVuTanSYwuL-iaaqz9h2u5g8A=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 06/30/2015 11:24 PM, Andres Freund wrote:
>
>> On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:
>>
>>> Hm. Right. A recheck of the value after the queuing should be sufficient
>>>> to fix? That's how we deal with the exact same scenarios for the normal
>>>> lock state, so that shouldn't be very hard to add.
>>>>
>>>
>>> Yeah. It's probably more efficient to not release the spinlock between
>>> checking the value and LWLockQueueSelf() though.
>>>
>>
>> Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
>> way in a bunch of callsites... So that'd be harder. Additionally I'm
>> planning to get rid of the spinlocks around queuing (they show up as
>> bottlenecks in contended workloads), so it seems more future proof not
>> to assume that either way. On top of that I think we should, when
>> available (or using the same type of fallback as for 32bit?), use 64 bit
>> atomics for the var anyway.
>>
>> I'll take a stab at fixing this tomorrow.
>>>
>>
>> Thanks! Do you mean both or "just" the second issue?
>>
>
> Both. Here's the patch.
>
> Previously, LWLockAcquireWithVar set the variable associated with the lock
> atomically with acquiring it. Before the lwlock-scalability changes, that
> was straightforward because you held the spinlock anyway, but it's a lot
> harder/expensive now. So I changed the way acquiring a lock with a variable
> works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
> the current lock holder has updated the variable. The LWLockAcquireWithVar
> function is gone - you now just use LWLockAcquire(), which always clears
> the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
> you want to set the variable immediately. LWLockWaitForVar() always waits
> if the flag is not set, i.e. it will not return regardless of the
> variable's value, if the current lock-holder has not updated it yet.
>
> This passes make check, but I haven't done any testing beyond that. Does
> this look sane to you?

After applying this patch to commit fdf28853ae6a397497b79f, it has survived
testing long enough to convince that this fixes the problem.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-07-19 18:54:16 Re: pg_dump quietly ignore missing tables - is it bug?
Previous Message Noah Misch 2015-07-19 17:18:54 Re: security labels on databases are bad for dump & restore