Re: Spinlock performance improvement proposal

From: "Marc G(dot) Fournier" <scrappy(at)hub(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spinlock performance improvement proposal
Date: 2001-09-26 17:18:38
Message-ID: 20010926131453.H58361-100000@mail1.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Sounds cool to me ... definitely something to fix before v7.2, if its as
"easy" as you make it sound ... I'm expecting the new drive to be
installed today (if all goes well ... Thomas still has his date/time stuff
to finish off, now that CVSup is fixed ...

Let''s try and target Monday for Beta then? I think the only two
outstaandings are you and Thomas right now?

Bruce, that latest rtree patch looks intriguing also ... can anyone
comment positive/negative about it, so that we can try and get that in
before Beta?

On Wed, 26 Sep 2001, Tom Lane wrote:

> At the just-past OSDN database conference, Bruce and I were annoyed by
> some benchmark results showing that Postgres performed poorly on an
> 8-way SMP machine. Based on past discussion, it seems likely that the
> culprit is the known inefficiency in our spinlock implementation.
> After chewing on it for awhile, we came up with an idea for a solution.
>
> The following proposal should improve performance substantially when
> there is contention for a lock, but it creates no portability risks
> because it uses the same system facilities (TAS and SysV semaphores)
> that we have always relied on. Also, I think it'd be fairly easy to
> implement --- I could probably get it done in a day.
>
> Comments anyone?
>
> regards, tom lane
>
>
> Plan:
>
> Replace most uses of spinlocks with "lightweight locks" (LW locks)
> implemented by a new lock manager. The principal remaining use of true
> spinlocks (TAS locks) will be to provide mutual exclusion of access to
> LW lock structures. Therefore, we can assume that spinlocks are never
> held for more than a few dozen instructions --- and never across a kernel
> call.
>
> It's pretty easy to rejigger the spinlock code to work well when the lock
> is never held for long. We just need to change the spinlock retry code
> so that it does a tight spin (continuous retry) for a few dozen cycles ---
> ideally, the total delay should be some small multiple of the max expected
> lock hold time. If lock still not acquired, yield the CPU via a select()
> call (10 msec minimum delay) and repeat. Although this looks inefficient,
> it doesn't matter on a uniprocessor because we expect that backends will
> only rarely be interrupted while holding the lock, so in practice a held
> lock will seldom be encountered. On SMP machines the tight spin will win
> since the lock will normally become available before we give up and yield
> the CPU.
>
> Desired properties of the LW lock manager include:
> * very fast fall-through when no contention for lock
> * waiting proc does not spin
> * support both exclusive and shared (read-only) lock modes
> * grant lock to waiters in arrival order (no starvation)
> * small lock structure to allow many LW locks to exist.
>
> Proposed contents of LW lock structure:
>
> spinlock mutex (protects LW lock state and PROC queue links)
> count of exclusive holders (always 0 or 1)
> count of shared holders (0 .. MaxBackends)
> queue head pointer (NULL or ptr to PROC object)
> queue tail pointer (could do without this to save space)
>
> If a backend sees it must wait to acquire the lock, it adds its PROC
> struct to the end of the queue, releases the spinlock mutex, and then
> sleeps by P'ing its per-backend wait semaphore. A backend releasing the
> lock will check to see if any waiter should be granted the lock. If so,
> it will update the lock state, release the spinlock mutex, and finally V
> the wait semaphores of any backends that it decided should be released
> (which it removed from the lock's queue while holding the sema). Notice
> that no kernel calls need be done while holding the spinlock. Since the
> wait semaphore will remember a V occurring before P, there's no problem
> if the releaser is fast enough to release the waiter before the waiter
> reaches its P operation.
>
> We will need to add a few fields to PROC structures:
> * Flag to show whether PROC is waiting for an LW lock, and if so
> whether it waits for read or write access
> * Additional PROC queue link field.
> We can't reuse the existing queue link field because it is possible for a
> PROC to be waiting for both a heavyweight lock and a lightweight one ---
> this will occur when HandleDeadLock or LockWaitCancel tries to acquire
> the LockMgr module's lightweight lock (formerly spinlock).
>
> It might seem that we also need to create a second wait semaphore per
> backend, one to wait on HW locks and one to wait on LW locks. But I
> believe we can get away with just one, by recognizing that a wait for an
> LW lock can never be interrupted by a wait for a HW lock, only vice versa.
> After being awoken (V'd), the LW lock manager must check to see if it was
> actually granted the lock (easiest way: look at own PROC struct to see if
> LW lock wait flag has been cleared). If not, the V must have been to
> grant us a HW lock --- but we still have to sleep to get the LW lock. So
> remember this happened, then loop back and P again. When we finally get
> the LW lock, if there was an extra P operation then V the semaphore once
> before returning. This will allow ProcSleep to exit the wait for the HW
> lock when we return to it.
>
> Fine points:
>
> While waiting for an LW lock, we need to show in our PROC struct whether
> we are waiting for read or write access. But we don't need to remember
> this after getting the lock; if we know we have the lock, it's easy to
> see by inspecting the lock whether we hold read or write access.
>
> ProcStructLock cannot be replaced by an LW lock, since a backend cannot
> use an LW lock until it has obtained a PROC struct and a semaphore,
> both of which are protected by this lock. It seems okay to use a plain
> spinlock for this purpose. NOTE: it's okay for SInvalLock to be an LW
> lock, as long as the LW mgr does not depend on accessing the SI array
> of PROC objects, but only chains through the PROCs themselves.
>
> Another tricky point is that some of the setup code executed by the
> postmaster may try to to grab/release LW locks. Here, we can probably
> allow a special case for MyProc=NULL. It's likely that we should never
> see a block under these circumstances anyway, so finding MyProc=NULL when
> we need to block may just be a fatal error condition.
>
> A nastier case is checkpoint processes; these expect to grab BufMgr and
> WAL locks. Perhaps okay for them to do plain sleeps in between attempts
> to grab the locks? This says that the MyProc=NULL case should release
> the spinlock mutex, sleep 10 msec, try again, rather than any sort of error
> or expectation of no conflict. Are there any cases where this represents
> a horrid performance loss? Checkpoint itself seems noncritical.
>
> Alternative is for checkpoint to be allowed to create a PROC struct (but
> not to enter it in SI list) so's it can participate normally in LW lock
> operations. That seems a good idea anyway, actually, so that the PROC
> struct's facility for releasing held LW locks at elog time will work
> inside the checkpointer. (But that means we need an extra sema too?
> Okay, but don't want an extra would-be backend to obtain the extra sema
> and perhaps cause a checkpoint proc to fail. So must allocate the PROC
> and sema for checkpoint process separately from those reserved for
> backends.)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-09-26 17:22:48 Re: Spinlock performance improvement proposal
Previous Message mlw 2001-09-26 17:15:14 Re: PostgreSQL / PHP Overrun Error