Skip site navigation (1) Skip section navigation (2)

Hot standby, recovery infra

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Hot standby, recovery infra
Date: 2009-01-28 10:04:57
Message-ID: 49802DC9.6000406@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
I've been reviewing and massaging the so called recovery infra patch.

To recap, the goal is to:
- start background writer during (archive) recovery
- skip the shutdown checkpoint at the end of recovery. Instead, the 
database is brought up immediately, and the bgwriter performs a normal 
online checkpoint, while we're already accepting connections.
- keep track of when we reach a consistent point in the recovery, where 
we could let read-only backends in. Which is obviously required for hot 
standby

The 1st and 2nd points provide some useful functionality, even without 
the rest of the hot standby patch.

I've refactored the patch quite heavily, making it a lot simpler, and 
over 1/3 smaller than before:

The signaling between the bgwriter and startup process during recovery 
was quite complicated. The startup process periodically sent checkpoint 
records to the bgwriter, so that bgwriter could perform restart points. 
I've replaced that by storing the last seen checkpoint in a shared 
memory in xlog.c. CreateRestartPoint() picks it up from there. This 
means that bgwriter can decide autonomously when to perform a restart 
point, it no longer needs to be told to do so by the startup process. 
Which is nice in a standby. What could happen before is that the standby 
processes a checkpoint record, and decides not to make it a restartpoint 
because not enough time has passed since last one. If we then get a long 
idle period after that, we'd really want to make the previous checkpoint 
record a restart point after all, after some time has passed. That is 
what will happen now, which is a usability enhancement, although the 
real motivation for this refactoring was to make the code simpler.

The bgwriter is now always responsible for all checkpoints and 
restartpoints. (well, except for a stand-alone backend). Which makes it 
easier to understand what's going on, IMHO.

There was one pretty fundamental bug in the minsafestartpoint handling: 
it was always set when a WAL file was opened for reading. Which means it 
was also moved backwards when the recovery began by reading the WAL 
segment containing last restart/checkpoint, rendering it useless for the 
purpose it was designed. Fortunately that was easy to fix. Another tiny 
bug was that log_restartpoints was not respected, because it was stored 
in a variable in startup process' memory, and wasn't seen by bgwriter.

One aspect that troubles me a bit is the changes in XLogFlush. I guess 
we no longer have the problem that you can't start up the database if 
we've read in a corrupted page from disk, because we now start up before 
checkpointing. However, it does mean that if a corrupt page is read into 
shared buffers, we'll never be able to checkpoint. But then again, I 
guess that's already true without this patch.


I feel quite good about this patch now. Given the amount of code churn, 
it requires testing, and I'll read it through one more time after 
sleeping over it. Simon, do you see anything wrong with this?

(this patch is also in my git repository at git.postgresql.org, branch 
recoveryinfra.)

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment: recovery-infra-dee8f65be.patch
Description: text/x-diff (44.0 KB)

Responses

pgsql-hackers by date

Next:From: Peter EisentrautDate: 2009-01-28 10:18:16
Subject: How to get SE-PostgreSQL acceptable
Previous:From: Zdenek KotalaDate: 2009-01-28 09:28:27
Subject: Re: pg_upgrade project status

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group