I remember Heikki mentioned improving redo recovery in one of the
emails in the past, so I know people are already thinking about this.
I have some ideas and just wanted to get comments here.
ISTM that its important to keep the redo recovery time as small as possible
in order to reduce the downtime in case of unplanned maintenence.
One way to do this is to take checkpoints very aggressively to keep the
amount of redo work small. But the current checkpoint logic writes all
the dirty buffers to disk and hence generates lots of IO. That limits our
ability to take very frequent checkpoints.
The current redo-recovery is a single threaded, synchronous process.
The XLOG is read sequentially, each log record is examined and replayed
if required. This requires reading disk blocks in the shared buffers and
applying changes to the buffer. The reading happens synchronously and
that would usually make the redo process very slow.
What I am thinking is if we can read ahead these blocks in the shared
buffers and then apply redo changes to them, it can potentially improve things
a lot. If there are multiple read requests, kernel (or controller ?)
schedule the reads more efficiently. One way to do this is to read ahead the
XLOG and make asynchronous read requests for these blocks. But I am not
sure if we support asynchronous reads yet. Another (and may be easier) way
is to fork another process which can just read-ahead the XLOG and get the
blocks in memory while other process does the normal redo recovery.
One obvious downside of reading ahead would be that we may need to
jump backward and forward in the XLOG file which is otherwise sequentially
read. But that can be handled by using XLOG buffers for redo.
Btw, isn't our redo recovery completely physical in nature ? I mean, can we
replay redo logs related to a block independent of other blocks ? The reason
I am asking because if thats the case, ISTM we can introduce parallelism in
recovery by splitting and reordering the xlog records and then run multiple
processes to do the redo recovery.
pgsql-hackers by date
|Next:||From: Heikki Linnakangas||Date: 2008-02-29 13:07:11|
|Subject: Re: "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables|
|Previous:||From: Simon Riggs||Date: 2008-02-29 11:28:49|
|Subject: Re: CREATE TABLE, load and freezing|