Skip site navigation (1) Skip section navigation (2)

Reworking WAL locking

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Paul van den Bogaard <Paul(dot)Vandenbogaard(at)Sun(dot)COM>
Subject: Reworking WAL locking
Date: 2008-02-14 13:44:18
Message-ID: 1202996658.16770.664.camel@ebony.site (view raw or flat)
Thread:
Lists: pgsql-hackers
Paul van den Bogaard (Sun) suggested to me that we could use more than
two WAL locks to improve concurrency. I think its possible to introduce
such a scheme with some ease. All mods within xlog.c

The scheme below requires an extra LWlock per WAL buffer.

Locking within XLogInsert() would look like this:

Calculate length of data to be inserted.
Calculate initial CRC

LWLockAcquire(WALInsertLock, LW_EXCLUSIVE)

Reserve space to write into. 
LSN = current Insert pointer
Move pointer forward by length of data to be inserted, acquiring
WALWriteLock if required to ensure space is available.

LWLockAcquire(LSNGetWALPageLockId(LSN), LW_SHARED);

Note that we don't lock every page, just the first one of the set we
want, but we hold it until all page writes are complete.

LWLockRelease(WALInsertLock);

finish calculating CRC
write xlog into reserved space
	
LWLockRelease(LSNGetWALPageLockId(LSN));

XLogWrite() will then try to get a conditional LW_EXCLUSIVE lock
sequentially on each page it plans to write. It keeps going until it
fails to get the lock, then writes. Callers of XLogWrite will never be
able to pass a backend currently performing the wal buffer fill.

We write whole page at a time.

Next time, we do a regular lock wait on the same page, so that we always
get a page eventually.

This requires us to get 2 locks for an XLogInsert rather than just one.
However the second lock is always acquired with zero-wait time when the
wal_buffers are sensibly sized. Overall this should reduce wait time for
the WALInsertLock since it seems likely that each actual filling of WAL
buffers will effect different cache lines and are very likely to be able
to be performed in parallel.

Sounds good to me.

Any objections/comments before this can be tried out? 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


Responses

pgsql-hackers by date

Next:From: Bernd HelmleDate: 2008-02-14 14:29:29
Subject: Re: Show INHERIT in \du
Previous:From: Martijn van OosterhoutDate: 2008-02-14 13:30:15
Subject: Re: wishlist for 8.4

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group