Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Date: 2014-02-14 12:52:45
Message-ID: EE0DD308-3A11-47E7-99FA-1F41C29DE448@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Feb14, 2014, at 13:36 , Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-02-14 13:28:47 +0100, Florian Pflug wrote:
>>> I don't think that can actually happen because the head of the wait list
>>> isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
>>> for a while...
>>
>> Hm, true, but does that protect us under all circumstances? If another
>> backend grabs the lock before B gets a chance to do so, B will become the
>> wait queue's head, and anyone who enqueues behind B will do so by updating
>> B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
>> by the original lock holder.
>>
>> The specific sequence I have in mind is:
>>
>> A: Take lock
>> B: Enqueue
>> A: Release lock, set B's lwWaiting to false
>> D: Acquire lock
>> B: Enqueue after spurious wakeup
>> (lock->head := B)
>> C: Enqueue behind B
>> (B->lwWaitLink := C, lock->tail := C)
>> A: Set B's wWaitLink back to NULL, thereby corrupting the queue
>> (B->lwWaitLink := NULL)
>> D: Release lock, dequeue and wakeup B
>> (lock->head := B->lwWaitLink, i.e. lock->head := NULL)
>> B: Take and release lock, queue appears empty!
>> C blocks indefinitely.
>
> That's assuming either reordering by the compiler or an out-of-order
> store architecture, right?

Yes, it requires that a backend exits out of the PGSemaphoreLock loop in
LWLockAcquire before lwWaitLink has been reset to NULL by the previous lock
holder's LWLockRelease. Only if that happens there is a risk of the PGPROC
being on a wait queue by the time lwWaitLink is finally reset to NULL.

>>>> I wonder whether LWLockRelease really needs to update lwWaitLink at all.
>>>> We take the backends we awake off the queue by updating the queue's head and
>>>> tail, so the contents of lwWaitLink should only matter once the backend is
>>>> re-inserted into some wait queue. But when doing that, we reset lwWaitLink
>>>> back to NULL anway.
>>>
>>> I don't like that, this stuff is hard to debug already, having stale
>>> pointers around doesn't help. I think we should just add the barrier in
>>> the releases with barrier.h and locally use a volatile var in the
>>> branches before that.
>>
>> I like the idea of updating lwWaiting and lwWaitLink while still holding the
>> spinlock better. The costs are probably negligible, and it'd make the code much
>> easier to reason about.
>
> I agree we should do that, but imo not in the backbranches. Anything
> more than than the minimal fix in that code should be avoided in the
> stable branches, this stuff is friggin performance sensitive, and the
> spinlock already is a *major* contention point in many workloads.

No argument there. But unless I missed something, there actually is a bug
there, and using volatile won't fix it. A barrier would, but what about the
back branches that don't have barrier.h? AFAICS the only other choices we have on
these branches are to either ignore the bug - it's probably *extremely* unlikely
- or to use a spinlock acquire/release cycle to create a barrier. The former
leaves me with a bit of an uneasy feeling, and the latter quite certainly has
a larger performance impact than moving the PGPROC updates within the section
protected by the spinlock and using an array to remember the backends to wakeup.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2014-02-14 12:55:06 Re: walsender doesn't send keepalives when writes are pending
Previous Message Andres Freund 2014-02-14 12:40:14 Re: issue with gininsert under very high load