Re: Wait free LW_SHARED acquisition - v0.2

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wait free LW_SHARED acquisition - v0.2
Date: 2014-10-22 14:34:22
Message-ID: 20141022143422.GF5790@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-10-21 19:56:05 +0530, Amit Kapila wrote:
> On Wed, Oct 8, 2014 at 6:17 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
> >
> > On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> > > 2.
> > > LWLockWakeup()
> > > {
> > > ..
> > > #ifdef LWLOCK_STATS
> > > lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
> > > #else
> > > SpinLockAcquire(&lock->mutex);
> > > #endif
> > > ..
> > > }
> > >
> > > Earlier while releasing lock, we don't count it towards LWLock stats
> > > spin_delay_count. I think if we see other places in lwlock.c, it only
> gets
> > > counted when we try to acquire it in a loop.
> >
> > I think the previous situation was clearly suboptimal. I've now modified
> > things so all spinlock acquirations are counted.
>
> Code has mainly 4 stats (sh_acquire_count, ex_acquire_count,
> block_count, spin_delay_count) to track, if I try to see
> all stats together to understand the contention situation,
> the unpatched code makes sense.

I don't think it does. It completely disregards that the contention may
actually be in LWLockRelease(). That contributes to to spinlock
contention just as much as LWLockAcquire().

> spin_delay_count gives
> how much delay has happened to acquire spinlock which when
> combined with other stats gives the clear situation about
> the contention around aquisation of corresponding LWLock.
> Now if we want to count the spin lock delay for Release call
> as well, then the meaning of the stat is getting changed.
> It might be that new meaning of spin_delay_count stat is more
> useful in some situations, however the older one has its own
> benefits, so I am not sure if changing this as part of this
> patch is the best thing to do.

In which case does the old definition make sense, where the new one
doesn't? I don't think it exists.

And changing it here seems to make sense because spinlock contention
fundamentally changes it meaning for lwlocks anyway as in most paths we
don't take a spinlock anymore.

> > > 5.
> > > LWLockWakeup()
> > > {
> > > ..
> > > dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> > > {
> > > PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> > > LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> > > dlist_delete(&waiter->lwWaitLink);
> > > pg_write_barrier();
> > > waiter->lwWaiting = false;
> > > PGSemaphoreUnlock(&waiter->sem);
> > > }
> > > ..
> > > }
> > >
> > > Why can't we decrement the nwaiters after waking up? I don't think
> > > there is any major problem even if callers do that themselves, but
> > > in some rare cases LWLockRelease() might spuriously assume that
> > > there are some waiters and tries to call LWLockWakeup(). Although
> > > this doesn't create any problem, keeping the value sane is good unless
> > > there is some problem in doing so.
> >
> > That won't work because then LWLockWakeup() wouldn't be called when
> > necessary - precisely because nwaiters is 0.
>
> > The reason I've done so is that it's otherwise much harder to debug
> > issues where there are backend that have been woken up already, but
> > haven't rerun yet. Without this there's simply no evidence of that
> > state. I can't see this being relevant for performance, so I'd rather
> > have it stay that way.
>
> I am not sure what useful information we can get during debugging by not
> doing this in LWLockWakeup()

It's useful because you can detect backends that have been scheduled to
acquire the lock, but haven't yet. They're otherwise "invisible".

> and w.r.t performance it can lead extra
> function call, few checks and I think in some cases even can
> acquire/release spinlock.

I fail to see how that could be the case. And again, this is code that's
only executed around a couple syscalls. And the cacheline will be
touched around there *anyway*.

> > > 6.
> > > LWLockWakeup()
> > > {
> > > ..
> > > dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> > > {
> > > ..
> > > if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
> > > continue;
> > > ..
> > > if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
> > > {
> > > ..
> > > wokeup_somebody = true;
> > > }
> > > ..
> > > }
> > > ..
> > > }
> > >
> > > a.
> > > IIUC above logic, if the waiter queue is as follows:
> > > (S-Shared; X-Exclusive) S X S S S X S S
> > >
> > > it can skip the exclusive waiters and release shared waiter.
> > >
> > > If my understanding is right, then I think instead of continue, there
> > > should be *break* in above logic.
> >
> > No, it looks correct to me. What happened is that the first S was woken
> > up. So there's no point in waking up an exclusive locker, but further
> > non-exclusive lockers can be woken up.
>
> Okay, even then it makes the current logic of wakingup
> different which I am not sure is what this patch is intended
> for.

It's already done in a separate patch...

> > > b.
> > > Consider below sequence of waiters:
> > > (S-Shared; X-Exclusive) S S X S S
> > >
> > > I think as per un-patched code, it will wakeup waiters uptill
> (including)
> > > first Exclusive, but patch will wake up uptill (*excluding*) first
> > > Exclusive.
> >
> > I don't think the current code does that.
>
> LWLockRelease()
> {
> ..
> /*
> * If the front waiter wants exclusive lock, awaken him only.
> *
> Otherwise awaken as many waiters as want shared access.
> */
> if (proc-
> >lwWaitMode != LW_EXCLUSIVE)
> {
> while (proc->lwWaitLink !=
> NULL &&
> proc->lwWaitLink->lwWaitMode != LW_EXCLUSIVE)
> {
> if (proc->lwWaitMode != LW_WAIT_UNTIL_FREE)
> releaseOK = false;
> proc = proc->lwWaitLink;
> }
> }
> /* proc is now the last PGPROC to be
> released */
> lock->head = proc->lwWaitLink;
> proc->lwWaitLink = NULL;
> ..
> }
>
> In the above code, if the first waiter to be woken up is Exclusive waiter,
> then it will woke that waiter, else shared waiters till it got
> the first exclusive waiter and then first exlusive waiter.

That's would be bug then. Per the comment you quoted "If the front
waiter wants exclusive lock, awaken him only. Otherwise awaken as many
waiters as want shared access.".

But I don't think it's what's happening. Note that 'proc =
proc->lwWaitLink;' is only executed if 'proc->lwWaitLink->lwWaitMode !=
LW_EXCLUSIVE'. Which is the next waiter...

> > And it'd be a pretty pointless
> > behaviour, leading to useless increased contention. The only time it'd
> > make sense for X to be woken up is when it gets run faster than the S
> > processes.
>
> Do we get any major benefit by changing the logic of waking up waiters?

Yes.

> > > 7.
> > > LWLockWakeup()
> > > {
> > > ..
> > > dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> > > {
> > > ..
> > > dlist_delete(&waiter->lwWaitLink);
> > > dlist_push_tail(&wakeup, &waiter->lwWaitLink);
> > > ..
> > > }
> > > ..
> > > }
> > >
> > > Use of dlist has simplified the code, but I think there might be a
> slight
> > > overhead of maintaining wakeup queue as compare to un-patched
> > > mechanism especially when there is a long waiter queue.
> >
> > I don't see that as being relevant. The difference is an instruction or
> > two - in the slow path we'll enter the kernel and sleep. This doesn't
> > matter in comparison.
>
> Okay, however I see Robert has also raised a point on this issue
> which I am not sure is concluded.
>
> > And the code is *so* much more readable.
>
> Code is more readable, but I don't understand why you
> want to do refactoring as part of this patch which ideally
> doesn't get any benefit from the same.

I did it first without. But there's required stuff like
LWLockDequeueSelf(). And I had several bugs because of the list stuff.

And I did separate the conversion into a separate patch?

> > > 8.
> > > LWLockConditionalAcquire()
> > > {
> > > ..
> > > /*
> > > * We ran into an exclusive lock and might have blocked another
> > > * exclusive lock from taking a shot because it took a time to back
> > > * off. Retry till we are either sure we didn't block somebody (because
> > > * somebody else certainly has the lock) or till we got it.
> > > *
> > > * We cannot rely on the two-step lock-acquisition protocol as in
> > > * LWLockAcquire because we're not using it.
> > > */
> > > if (potentially_spurious)
> > > {
> > > SPIN_DELAY();
> > > goto retry;
> > > }
> > > ..
> > > }
> > >
> > > Due to above logic, I think it can keep on retrying for long time before
> > > it actually concludes whether it got lock or not incase other backend/'s
> > > takes Exclusive lock after *double_check* and release before
> > > unconditional increment of shared lock in function LWLockAttemptLock.
> > > I understand that it might be difficult to have such a practical
> scenario,
> > > however still there is a theoratical possibility of same.
> >
> > I'm not particularly concerned. We could optimize it a bit, but I really
> > don't think it's necessary.
>
> No issues, I was slightly worried about cases where this
> API wasn't suppose to take time earlier (like for contentlock in
> BufferAlloc), but now it starts taking time.

The API previously acquired a spinlock. That also took time...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2014-10-22 14:57:42 Re: from_collapse_limit considerations
Previous Message Tom Lane 2014-10-22 14:13:58 Re: Reducing lock strength of adding foreign keys