Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Date: 2014-02-14 12:28:47
Message-ID: DB2AB266-C89A-43FE-810A-0C316FFDA6B7@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Feb14, 2014, at 11:45 , Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-02-13 15:34:09 +0100, Florian Pflug wrote:
>> On Feb10, 2014, at 17:38 , Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
>>>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>>>>> So what we need to do is to acquire a write barrier between the
>>>>> assignments to lwWaitLink and lwWaiting, i.e.
>>>>> proc->lwWaitLink = NULL;
>>>>> pg_write_barrier();
>>>>> proc->lwWaiting = false;
>>>>
>>>> You didn't really explain why you think that ordering is necessary?
>>>> Each proc being awoken will surely see both fields updated, and other
>>>> procs won't be examining these fields at all, since we already delinked
>>>> all these procs from the LWLock's queue.
>>>
>>> The problem is that one the released backends could wake up concurrently
>>> because of a unrelated, or previous PGSemaphoreUnlock(). It could see
>>> lwWaiting = false, and thus wakeup and acquire the lock, even if the
>>> store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
>>> ordering here) yet.
>>> Now, it may well be that there's no practical consequence of that, but I
>>> am not prepared to bet on it.
>>
>> AFAICS there is a potential problem if three backends are involved, since
>> by the time the waiting backend's lwWaitLink is set to NULL after the
>> original lock holder released the lock, the waiting backend might already
>> have acquired the lock (due to a spurious wakeup) *and* a third backend
>> might have already enqueued behind it.
>>
>> The exact sequence for backends A,B,C that corrupts the wait queue is:
>>
>> A: Release lock, set B's lwWaiting to false
>> B: Wakes up spuriously, takes the lock
>> C: Enqueues behind B
>> A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
>> causing C and anyone behind it to block indefinitely.
>
> I don't think that can actually happen because the head of the wait list
> isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
> for a while...

Hm, true, but does that protect us under all circumstances? If another
backend grabs the lock before B gets a chance to do so, B will become the
wait queue's head, and anyone who enqueues behind B will do so by updating
B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
by the original lock holder.

The specific sequence I have in mind is:

A: Take lock
B: Enqueue
A: Release lock, set B's lwWaiting to false
D: Acquire lock
B: Enqueue after spurious wakeup
(lock->head := B)
C: Enqueue behind B
(B->lwWaitLink := C, lock->tail := C)
A: Set B's wWaitLink back to NULL, thereby corrupting the queue
(B->lwWaitLink := NULL)
D: Release lock, dequeue and wakeup B
(lock->head := B->lwWaitLink, i.e. lock->head := NULL)
B: Take and release lock, queue appears empty!
C blocks indefinitely.

>> I wonder whether LWLockRelease really needs to update lwWaitLink at all.
>> We take the backends we awake off the queue by updating the queue's head and
>> tail, so the contents of lwWaitLink should only matter once the backend is
>> re-inserted into some wait queue. But when doing that, we reset lwWaitLink
>> back to NULL anway.
>
> I don't like that, this stuff is hard to debug already, having stale
> pointers around doesn't help. I think we should just add the barrier in
> the releases with barrier.h and locally use a volatile var in the
> branches before that.

I like the idea of updating lwWaiting and lwWaitLink while still holding the
spinlock better. The costs are probably negligible, and it'd make the code much
easier to reason about.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-02-14 12:36:34 Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Previous Message Andres Freund 2014-02-14 12:05:22 walsender doesn't send keepalives when writes are pending