Re: Thinking about breaking up the BufMgrLock

From: Kenneth Marshall <ktm(at)it(dot)is(dot)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Thinking about breaking up the BufMgrLock
Date: 2005-02-07 14:14:28
Message-ID: 20050207141428.GA2580@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 06, 2005 at 07:30:37PM -0500, Tom Lane wrote:
>
> ReadBuffer needs to do a lookup to map the page ID to a buffer ID,
> which in principle requires only a shared lock on the page-to-buffer
> mapping (embodied in the buf_table hash table). Assuming success, it
> also needs to mark the buffer pinned and update the LRU-list position
> of the buffer. Marking pinned is certainly a buffer-local change,
> so we could imagine splitting up the BufMgrLock like this:
>
> 1. A global LWLock for the page-to-buffer mapping, call it say
> BufMappingLock. Share lock on this is sufficient to allow reading the
> hash table; must get exclusive lock when reassigning any buffer's page
> identity.
>
> 2. A global LWLock for the LRU strategy information, say BufLRULock
> or BufStrategyLock.
>
> 3. Per-buffer LWLocks (or maybe just spinlocks) protecting the state of
> each buffer header; you need this lock to examine or change a buffer
> header.
>
> ReleaseBuffer has no locking problems in this formulation: it just grabs
> the per-buffer lock, decrements its refcount, releases the lock.
>
For the per-buffer, a latch would provide a lightweight method of updating
the contents of the buffer without hampering the read-only access. A latch
is comprised of a latch bit and a sequence number that can be set in an
atomic action. The flow for the two cases is simple:

Write:
1. Get latch.
2. Update the buffer.
3. Increment the sequence number.
4. Release the latch.

Read:
1. Read version number.
2. Read buffer.
3. Check latch. If latched, go to 1.
4. If version number has changed, go to 1.

By using this process, readers will only see a consistent state of
the buffer. Also, since the read does not entail a write operation
it will not cause a cache line update and contribute to the a cache
update "storm". The "get latch" operation can be implemented using
an atomic operation such as TAS (test-and-set) and CAS (compare-and-set).
This would provide readers an extremely lightweight access to the
buffer - no cache line update hit. If you need to have sequenced access
to the buffer, then you would need to use LWLocks but in many cases
such as 3. in Tom's list a latch would work well.

> ReadBuffer looks like:
>
> * Acquire share lock on BufMappingLock.
> * Search hash table for desired ID. (we assume success)
> * acquire per-buffer lock.
> * increment buffer refcount.
> * release per-buffer lock.
> * release share lock on BufMappingLock.
> * update the LRU state.
>
> (We have to bump the pin count on the target buffer before releasing the
> BufMappingLock, otherwise someone could reassign the buffer as soon as
> we release BufMappingLock.)
>
> This would be pretty good from a locking point of view, except that
> "update the LRU state" seems to require taking an exclusive lock on a
> global data structure, which puts us about back where we were.
> Contention for a global BufLRULock might be a bit better than for the
> existing overall BufMgrLock, but it'll still cripple the performance
> of ReadBuffer.
>
> Perhaps someone knows of a neat data structure that can maintain an LRU
> list with only local updates? I don't though.
>
The clock algorithm is pretty close to this and provides an approximation
to LRU that eleminates the need to move buffers to the MRU position by
using a reference bit.

> This would convert the existing "strict LRU" behavior into an
> "approximate LRU". I'm worried that the change might be penny-wise and
> pound-foolish: if a poorer cache management algorithm causes us to have
> to do more I/O, it's probably a net loss to save some lock contention.
> But without a smart idea about data structures I don't see how to do
> better.
>

One alternative to an approximate LRU, such as the clock algorithm, would
be to have multiple buffer pools as we discussed in the previous thread.
The contention would be reduced by 1/N, where N is the number of pools.
Of course, buffers should be allocated in a fashion that would maximize
locality and minimize the effect of scan cache polution.

More food for thought.

Ken

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message pgsql 2005-02-07 14:30:45 Re: Query optimizer 8.0.1 (and 8.0)
Previous Message Florian Hars 2005-02-07 12:06:13 BUG #1467: fe_connect doesn't handle EINTR right