Re: Reducing overhead of frequent table locks

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing overhead of frequent table locks
Date: 2011-05-24 12:53:11
Message-ID: BANLkTikxUXEyupnyOX+2zTjav4Pxz9txBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 24, 2011 at 5:07 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> This drops the part about only transferring fast-path entries once when a
> strong_lock_counts cell transitions from zero to one.

Right: that's because I don't think that's what we want to do. I
don't think we want to transfer all per-backend locks to the shared
hash table as soon as anyone attempts to acquire a strong lock;
instead, I think we want to transfer only those fast-path locks which
have the same locktag as the strong lock someone is attempting to
acquire. If we do that, then it doesn't matter whether the
strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7:
we still have to check for strong locks with that particular locktag.

> Granted, that itself
> requires some yet-undiscussed locking.  For that matter, we can't have
> multiple strong lockers completing transfers on the same cell in parallel.
> Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
> strong locker holds for that entire "if" body and while decrementing the
> strong_lock_counts cell at lock release.

I was imagining that the per-backend LWLock would protect the list of
fast-path locks. So to transfer locks, you would acquire the
per-backend LWLock for the backend which has the lock, and then the
lock manager partition LWLock, and then perform the transfer.

> As far as the level of detail of this pseudocode goes, there's no need to hold
> the per-backend LWLock while transferring the fast-path entries.  You just
> need to hold it sometime between bumping strong_lock_counts and transferring
> the backend's locks.  This ensures that, for example, the backend is not
> sleeping in the middle of a fast-path lock acquisition for the whole duration
> of this code.

See above; I'm lost.

>> Now, a small fly in the ointment is that we haven't got, with
>> PostgreSQL, a portable library of memory primitives.  So there isn't
>> an obvious way of doing that sfence/lfence business.
>
> I was thinking that, if the final implementation could benefit from memory
> barrier interfaces, we should create those interfaces now.  Start with only a
> platform-independent dummy implementation that runs a lock/unlock cycle on a
> spinlock residing in backend-local memory.  I'm 75% sure that would be
> sufficient on all architectures for which we support spinlocks.  It may turn
> out that we can't benefit from such interfaces at this time ...

OK.

>> Now, it seems to
>> me that in the "strong lock" case, the sfence isn't really needed
>> anyway, because we're about to start acquiring and releasing an lwlock
>> for every backend, and that had better act as a full memory barrier
>> anyhow, or we're doomed.  The "weak lock" case is more interesting,
>> because we need the fence before we've taken any LWLock.
>
> Agreed.
>
>> But perhaps it'd be sufficient to just acquire the per-backend lwlock
>> before checking strong_lock_counts[].  If, as we hope, we get back a
>> zero, then we do the fast-path lock acquisition, release the lwlock,
>> and away we go.  If we get back any other value, then we've wasted an
>> lwlock acquisition cycle.  Or actually maybe not: it seems to me that
>> in that case we'd better transfer all of our fast-path entries into
>> the main hash table before trying to acquire any lock the slow way, at
>> least if we don't want the deadlock detector to have to know about the
>> fast-path.  So then we get this:
>>
>> !        if (level > ShareUpdateExclusiveLock)
>> !                ++strong_lock_counts[my_strong_lock_count_partition]
>> !                for each backend
>> !                        take per-backend lwlock for target backend
>> !                        transfer fastpath entries with matching locktag
>> !                        release per-backend lwlock for target backend
>> !        else if (level <= RowExclusiveLock)
>> !                take per-backend lwlock for own backend
>> !                if (strong_lock_counts[my_strong_lock_count_partition] == 0)
>> !                        fast-path lock acquisition
>> !                        done = true
>> !                else
>> !                        transfer all fastpath entries
>> !                release per-backend lwlock for own backend
>> !        if (!done)
>> !                normal_LockAcquireEx
>
> Could you elaborate on the last part (the need for "else transfer all fastpath
> entries") and, specifically, how it aids deadlock avoidance?  I didn't think
> this change would have any impact on deadlocks, because all relevant locks
> will be in the global lock table before any call to normal_LockAcquireEx.

Oh, hmm, maybe you're right. I was concerned about the possibility
that of a backend which already holds locks going to sleep on a lock
wait, and maybe running the deadlock detector, and failing to notice a
deadlock. But I guess that can't happen: if any of the locks it holds
are relevant to the deadlock detector, the backend attempting to
acquire those locks will transfer them before attempting to acquire
the lock itself, so it should be OK.

> To validate the locking at this level of detail, I think we need to sketch the
> unlock protocol, too.  On each strong lock release, we'll decrement the
> strong_lock_counts cell.  No particular interlock with fast-path lockers
> should be needed; a stray AccessShareLock needlessly making it into the global
> lock table is no problem.  As mentioned above, we _will_ need an interlock
> with lock transfer operations.  How will transferred fast-path locks get
> removed from the global lock table?  Presumably, the original fast-path locker
> should do so at transaction end; anything else would contort the life cycle.
> Then add a way for the backend to know which locks had been transferred as
> well as an interlock against concurrent transfer operations.  Maybe that's
> all.

I'm thinking that the backend can note, in its local-lock table,
whether it originally acquired a lock via the fast-path or not. Any
lock not originally acquired via the fast-path will be released just
as now. For any lock that WAS originally acquired via the fast-path,
we'll take our own per-backend lwlock, which protects the fast-path
queue, and scan the fast-path queue for a matching entry. If none is
found, then we know the lock was transferred, so release the
per-backend lwlock and do it the regular way (take lock manager
partition lock, etc.).

At transaction end, we need to release all non-session locks, so we
can consolidate a bit to avoid excess locking and unlocking. Take the
per-backend lwlock just once and scan through the queue. Any locks we
find there (that are not session locks) can be nuked from the
local-lock table and we're done. Release the per-backend lwlock. At
this point, any remaining locks that need to be released are in the
shared hash tables and we can proceed as now (see LockReleaseAll -
basically, we iterate over the lock partitions).

> To put it another way: the current system is fair; the chance of hitting lock
> exhaustion is independent of lock level.  The new system would be unfair; lock
> exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock
> acquisition, through no fault of that transaction.  I agree this isn't ideal,
> but it doesn't look to me like an unacceptable weakness.  Making lock slots
> first-come, first-served is inherently unfair; we're not at all set up to
> justly arbitrate between mutually-hostile lockers competing for slots.  The
> overall situation will get better, not worse, for the admin who wishes to
> defend against hostile unprivileged users attempting a lock table DOS.

Well, it's certainly true that the proposed system is far less likely
to bomb out trying to acquire an AccessShareLock than what we have
today, since in the common case the AccessShareLock doesn't use up any
shared resources. And that should make a lot of people happy. But as
to the bad scenario, one needn't presume that the lockers are hostile
- it may just be that the system is running on the edge of a full lock
table. In the worst case, someone wanting a strong lock on a table
may end up transferring a hundred or more locks (up to one per
backend) into that table. Even if they all fit and the strong locker
gets his lock, it may now happen that the space is just about
exhausted and other transactions start rolling back, apparently at
random.

Or, even more pathologically, one backend grabs a strong lock on table
X, but it just so happens that there is another table Y in the same
lock partition which is highly trafficked but, as all of the locks
involved are weak, uncontended. Now that strong_lock_counts[] is
non-zero for that partition, all those locks start going into the main
lock manager table. Performance will drop, which isn't great but we
can live with it. But maybe all the locks don't fit. Now we have a
situation in which, due to one backend acquiring a strong lock on
table A, a bunch of other backends are unable to obtain weak locks on
table B, and transactions start rolling back all over the place.

Now maybe you can argue that this scenario is sufficiently unlikely in
practice that we shouldn't worry about it, but if it does happen the
DBA will be incredibly confused, because an apparently innocuous
operation on one table will have resulted in rollbacks acquiring locks
on an apparently unrelated table. Maybe you want to argue that's
sufficiently rare that we shouldn't worry about it, but the
unpleasantness factor seems pretty high to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-05-24 14:03:04 Re: Reducing overhead of frequent table locks
Previous Message Alexander Korotkov 2011-05-24 12:22:21 Small patch for GiST: move childoffnum to child