Re: Reducing overhead of frequent table locks

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing overhead of frequent table locks
Date: 2011-05-24 09:07:34
Message-ID: 20110524090734.GA18831@tornado.gateway.2wire.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 23, 2011 at 09:15:27PM -0400, Robert Haas wrote:
> On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > ? ? ? ?if (level >= ShareUpdateExclusiveLock)
> > ? ? ? ? ? ? ? ?++strong_lock_counts[my_strong_lock_count_partition]
> > ? ? ? ? ? ? ? ?sfence
> > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 1)
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 1 */
> > ? ? ? ? ? ? ? ? ? ? ? ?import_all_local_locks
> > ? ? ? ? ? ? ? ?normal_LockAcquireEx
> > ? ? ? ?else if (level <= RowExclusiveLock)
> > ? ? ? ? ? ? ? ?lfence
> > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 2 */
> > ? ? ? ? ? ? ? ? ? ? ? ?local_only
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 3 */
> > ? ? ? ? ? ? ? ?else
> > ? ? ? ? ? ? ? ? ? ? ? ?normal_LockAcquireEx
> > ? ? ? ?else
> > ? ? ? ? ? ? ? ?normal_LockAcquireEx
> >
> > At marker 1, we need to block until no code is running between markers two and
> > three. ?You could do that with a per-backend lock (LW_SHARED by the strong
> > locker, LW_EXCLUSIVE by the backend). ?That would probably still be a win over
> > the current situation, but it would be nice to have something even cheaper.
>
> Barring some brilliant idea, or anyway for a first cut, it seems to me
> that we can adjust the above pseudocode by assuming the use of a
> LWLock. In addition, two other adjustments: first, the first line
> should test level > ShareUpdateExclusiveLock, rather than >=, per
> previous discussion. Second, import_all_local locks needn't really
> move everything; just those locks with a matching locktag. Thus:
>
> ! if (level > ShareUpdateExclusiveLock)
> ! ++strong_lock_counts[my_strong_lock_count_partition]
> ! sfence
> ! for each backend
> ! take per-backend lwlock for target backend
> ! transfer fast-path entries with matching locktag
> ! release per-backend lwlock for target backend
> ! normal_LockAcquireEx
> ! else if (level <= RowExclusiveLock)
> ! lfence
> ! if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> ! take per-backend lwlock for own backend
> ! fast-path lock acquisition
> ! release per-backend lwlock for own backend
> ! else
> ! normal_LockAcquireEx
> ! else
> ! normal_LockAcquireEx

This drops the part about only transferring fast-path entries once when a
strong_lock_counts cell transitions from zero to one. Granted, that itself
requires some yet-undiscussed locking. For that matter, we can't have
multiple strong lockers completing transfers on the same cell in parallel.
Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
strong locker holds for that entire "if" body and while decrementing the
strong_lock_counts cell at lock release.

As far as the level of detail of this pseudocode goes, there's no need to hold
the per-backend LWLock while transferring the fast-path entries. You just
need to hold it sometime between bumping strong_lock_counts and transferring
the backend's locks. This ensures that, for example, the backend is not
sleeping in the middle of a fast-path lock acquisition for the whole duration
of this code.

> Now, a small fly in the ointment is that we haven't got, with
> PostgreSQL, a portable library of memory primitives. So there isn't
> an obvious way of doing that sfence/lfence business.

I was thinking that, if the final implementation could benefit from memory
barrier interfaces, we should create those interfaces now. Start with only a
platform-independent dummy implementation that runs a lock/unlock cycle on a
spinlock residing in backend-local memory. I'm 75% sure that would be
sufficient on all architectures for which we support spinlocks. It may turn
out that we can't benefit from such interfaces at this time ...

> Now, it seems to
> me that in the "strong lock" case, the sfence isn't really needed
> anyway, because we're about to start acquiring and releasing an lwlock
> for every backend, and that had better act as a full memory barrier
> anyhow, or we're doomed. The "weak lock" case is more interesting,
> because we need the fence before we've taken any LWLock.

Agreed.

> But perhaps it'd be sufficient to just acquire the per-backend lwlock
> before checking strong_lock_counts[]. If, as we hope, we get back a
> zero, then we do the fast-path lock acquisition, release the lwlock,
> and away we go. If we get back any other value, then we've wasted an
> lwlock acquisition cycle. Or actually maybe not: it seems to me that
> in that case we'd better transfer all of our fast-path entries into
> the main hash table before trying to acquire any lock the slow way, at
> least if we don't want the deadlock detector to have to know about the
> fast-path. So then we get this:
>
> ! if (level > ShareUpdateExclusiveLock)
> ! ++strong_lock_counts[my_strong_lock_count_partition]
> ! for each backend
> ! take per-backend lwlock for target backend
> ! transfer fastpath entries with matching locktag
> ! release per-backend lwlock for target backend
> ! else if (level <= RowExclusiveLock)
> ! take per-backend lwlock for own backend
> ! if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> ! fast-path lock acquisition
> ! done = true
> ! else
> ! transfer all fastpath entries
> ! release per-backend lwlock for own backend
> ! if (!done)
> ! normal_LockAcquireEx

Could you elaborate on the last part (the need for "else transfer all fastpath
entries") and, specifically, how it aids deadlock avoidance? I didn't think
this change would have any impact on deadlocks, because all relevant locks
will be in the global lock table before any call to normal_LockAcquireEx.

To validate the locking at this level of detail, I think we need to sketch the
unlock protocol, too. On each strong lock release, we'll decrement the
strong_lock_counts cell. No particular interlock with fast-path lockers
should be needed; a stray AccessShareLock needlessly making it into the global
lock table is no problem. As mentioned above, we _will_ need an interlock
with lock transfer operations. How will transferred fast-path locks get
removed from the global lock table? Presumably, the original fast-path locker
should do so at transaction end; anything else would contort the life cycle.
Then add a way for the backend to know which locks had been transferred as
well as an interlock against concurrent transfer operations. Maybe that's
all.

> That seems like it ought to work, at least assuming the position of
> your fencing instructions was correct in the first place. But there's
> one big problem to worry about: what happens if the lock transfer
> fails due to shared memory exhaustion? It's not so bad in the "weak
> lock" case; it'll feel just like the already-existing case where you
> try to push another lock into the shared-memory hash table and there's
> no room. Essentially you've been living on borrowed time anyway. On
> the other hand, the "strong lock" case is a real problem, because a
> large number of granted fast-path locks can effectively DOS any strong
> locker, even one that wouldn't have conflicted with them. That's
> clearly not going to fly, but it's not clear to me what the best way
> is to patch around it.

To put it another way: the current system is fair; the chance of hitting lock
exhaustion is independent of lock level. The new system would be unfair; lock
exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock
acquisition, through no fault of that transaction. I agree this isn't ideal,
but it doesn't look to me like an unacceptable weakness. Making lock slots
first-come, first-served is inherently unfair; we're not at all set up to
justly arbitrate between mutually-hostile lockers competing for slots. The
overall situation will get better, not worse, for the admin who wishes to
defend against hostile unprivileged users attempting a lock table DOS.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-05-24 09:18:37 Re: SSI predicate locking on heap -- tuple or row?
Previous Message Pavan Deolasee 2011-05-24 06:58:28 Proposal: Another attempt at vacuum improvements