Re: Spinlock performance improvement proposal

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spinlock performance improvement proposal
Date: 2001-09-28 17:11:29
Message-ID: 200109281711.f8SHBTj04258@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Good summary. I agree checkpoint should look like as normal a Proc as
possible.

> At the just-past OSDN database conference, Bruce and I were annoyed by
> some benchmark results showing that Postgres performed poorly on an
> 8-way SMP machine. Based on past discussion, it seems likely that the
> culprit is the known inefficiency in our spinlock implementation.
> After chewing on it for awhile, we came up with an idea for a solution.
>
> The following proposal should improve performance substantially when
> there is contention for a lock, but it creates no portability risks
> because it uses the same system facilities (TAS and SysV semaphores)
> that we have always relied on. Also, I think it'd be fairly easy to
> implement --- I could probably get it done in a day.
>
> Comments anyone?
>
> regards, tom lane
>
>
> Plan:
>
> Replace most uses of spinlocks with "lightweight locks" (LW locks)
> implemented by a new lock manager. The principal remaining use of true
> spinlocks (TAS locks) will be to provide mutual exclusion of access to
> LW lock structures. Therefore, we can assume that spinlocks are never
> held for more than a few dozen instructions --- and never across a kernel
> call.
>
> It's pretty easy to rejigger the spinlock code to work well when the lock
> is never held for long. We just need to change the spinlock retry code
> so that it does a tight spin (continuous retry) for a few dozen cycles ---
> ideally, the total delay should be some small multiple of the max expected
> lock hold time. If lock still not acquired, yield the CPU via a select()
> call (10 msec minimum delay) and repeat. Although this looks inefficient,
> it doesn't matter on a uniprocessor because we expect that backends will
> only rarely be interrupted while holding the lock, so in practice a held
> lock will seldom be encountered. On SMP machines the tight spin will win
> since the lock will normally become available before we give up and yield
> the CPU.
>
> Desired properties of the LW lock manager include:
> * very fast fall-through when no contention for lock
> * waiting proc does not spin
> * support both exclusive and shared (read-only) lock modes
> * grant lock to waiters in arrival order (no starvation)
> * small lock structure to allow many LW locks to exist.
>
> Proposed contents of LW lock structure:
>
> spinlock mutex (protects LW lock state and PROC queue links)
> count of exclusive holders (always 0 or 1)
> count of shared holders (0 .. MaxBackends)
> queue head pointer (NULL or ptr to PROC object)
> queue tail pointer (could do without this to save space)
>
> If a backend sees it must wait to acquire the lock, it adds its PROC
> struct to the end of the queue, releases the spinlock mutex, and then
> sleeps by P'ing its per-backend wait semaphore. A backend releasing the
> lock will check to see if any waiter should be granted the lock. If so,
> it will update the lock state, release the spinlock mutex, and finally V
> the wait semaphores of any backends that it decided should be released
> (which it removed from the lock's queue while holding the sema). Notice
> that no kernel calls need be done while holding the spinlock. Since the
> wait semaphore will remember a V occurring before P, there's no problem
> if the releaser is fast enough to release the waiter before the waiter
> reaches its P operation.
>
> We will need to add a few fields to PROC structures:
> * Flag to show whether PROC is waiting for an LW lock, and if so
> whether it waits for read or write access
> * Additional PROC queue link field.
> We can't reuse the existing queue link field because it is possible for a
> PROC to be waiting for both a heavyweight lock and a lightweight one ---
> this will occur when HandleDeadLock or LockWaitCancel tries to acquire
> the LockMgr module's lightweight lock (formerly spinlock).
>
> It might seem that we also need to create a second wait semaphore per
> backend, one to wait on HW locks and one to wait on LW locks. But I
> believe we can get away with just one, by recognizing that a wait for an
> LW lock can never be interrupted by a wait for a HW lock, only vice versa.
> After being awoken (V'd), the LW lock manager must check to see if it was
> actually granted the lock (easiest way: look at own PROC struct to see if
> LW lock wait flag has been cleared). If not, the V must have been to
> grant us a HW lock --- but we still have to sleep to get the LW lock. So
> remember this happened, then loop back and P again. When we finally get
> the LW lock, if there was an extra P operation then V the semaphore once
> before returning. This will allow ProcSleep to exit the wait for the HW
> lock when we return to it.
>
> Fine points:
>
> While waiting for an LW lock, we need to show in our PROC struct whether
> we are waiting for read or write access. But we don't need to remember
> this after getting the lock; if we know we have the lock, it's easy to
> see by inspecting the lock whether we hold read or write access.
>
> ProcStructLock cannot be replaced by an LW lock, since a backend cannot
> use an LW lock until it has obtained a PROC struct and a semaphore,
> both of which are protected by this lock. It seems okay to use a plain
> spinlock for this purpose. NOTE: it's okay for SInvalLock to be an LW
> lock, as long as the LW mgr does not depend on accessing the SI array
> of PROC objects, but only chains through the PROCs themselves.
>
> Another tricky point is that some of the setup code executed by the
> postmaster may try to to grab/release LW locks. Here, we can probably
> allow a special case for MyProc=NULL. It's likely that we should never
> see a block under these circumstances anyway, so finding MyProc=NULL when
> we need to block may just be a fatal error condition.
>
> A nastier case is checkpoint processes; these expect to grab BufMgr and
> WAL locks. Perhaps okay for them to do plain sleeps in between attempts
> to grab the locks? This says that the MyProc=NULL case should release
> the spinlock mutex, sleep 10 msec, try again, rather than any sort of error
> or expectation of no conflict. Are there any cases where this represents
> a horrid performance loss? Checkpoint itself seems noncritical.
>
> Alternative is for checkpoint to be allowed to create a PROC struct (but
> not to enter it in SI list) so's it can participate normally in LW lock
> operations. That seems a good idea anyway, actually, so that the PROC
> struct's facility for releasing held LW locks at elog time will work
> inside the checkpointer. (But that means we need an extra sema too?
> Okay, but don't want an extra would-be backend to obtain the extra sema
> and perhaps cause a checkpoint proc to fail. So must allocate the PROC
> and sema for checkpoint process separately from those reserved for
> backends.)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-09-28 18:52:59 Re: Spinlock performance improvement proposal
Previous Message frederic massot 2001-09-28 16:13:36 Re: Problem with the accents