Re: Reworking WAL locking

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Paul van den Bogaard <Paul(dot)Vandenbogaard(at)Sun(dot)COM>
Subject: Re: Reworking WAL locking
Date: 2008-03-23 00:05:16
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Added to TODO:

* Improve WAL concurrency by increasing lock granularity


Simon Riggs wrote:
> Paul van den Bogaard (Sun) suggested to me that we could use more than
> two WAL locks to improve concurrency. I think its possible to introduce
> such a scheme with some ease. All mods within xlog.c
> The scheme below requires an extra LWlock per WAL buffer.
> Locking within XLogInsert() would look like this:
> Calculate length of data to be inserted.
> Calculate initial CRC
> LWLockAcquire(WALInsertLock, LW_EXCLUSIVE)
> Reserve space to write into.
> LSN = current Insert pointer
> Move pointer forward by length of data to be inserted, acquiring
> WALWriteLock if required to ensure space is available.
> LWLockAcquire(LSNGetWALPageLockId(LSN), LW_SHARED);
> Note that we don't lock every page, just the first one of the set we
> want, but we hold it until all page writes are complete.
> LWLockRelease(WALInsertLock);
> finish calculating CRC
> write xlog into reserved space
> LWLockRelease(LSNGetWALPageLockId(LSN));
> XLogWrite() will then try to get a conditional LW_EXCLUSIVE lock
> sequentially on each page it plans to write. It keeps going until it
> fails to get the lock, then writes. Callers of XLogWrite will never be
> able to pass a backend currently performing the wal buffer fill.
> We write whole page at a time.
> Next time, we do a regular lock wait on the same page, so that we always
> get a page eventually.
> This requires us to get 2 locks for an XLogInsert rather than just one.
> However the second lock is always acquired with zero-wait time when the
> wal_buffers are sensibly sized. Overall this should reduce wait time for
> the WALInsertLock since it seems likely that each actual filling of WAL
> buffers will effect different cache lines and are very likely to be able
> to be performed in parallel.
> Sounds good to me.
> Any objections/comments before this can be tried out?
> --
> Simon Riggs
> 2ndQuadrant
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

Bruce Momjian <bruce(at)momjian(dot)us>

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-03-23 00:27:03 Re: [HACKERS] quote_literal with NULL
Previous Message Bruce Momjian 2008-03-22 23:47:12 Re: Idea for minor tstore optimization