Skip site navigation (1) Skip section navigation (2)

Re: Hot standby, recovery infra

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, recovery infra
Date: 2009-01-29 10:11:56
Message-ID: 1233223916.4703.15.camel@ebony.2ndQuadrant (view raw or flat)
Thread:
Lists: pgsql-hackers
On Thu, 2009-01-29 at 11:20 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Thu, 2009-01-29 at 10:36 +0900, Fujii Masao wrote:
> >> Hi,
> >>
> >> On Wed, Jan 28, 2009 at 11:19 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >>>> I feel quite good about this patch now. Given the amount of code churn, it
> >>>> requires testing, and I'll read it through one more time after sleeping over
> >>>> it. Simon, do you see anything wrong with this?
> >>> I also read this patch and found something odd. I apologize if I misread it..
> >> If archive recovery fails after it reaches the last valid record
> >> in the last unfilled WAL segment, subsequent recovery might cause
> >> the following fatal error. This is because minSafeStartPoint indicates
> >> the end of the last unfilled WAL segment which subsequent recovery
> >> cannot reach. Is this bug? (I'm not sure how to fix this problem
> >> because I don't understand yet why minSafeStartPoint is required.)
> >>
> >>> FATAL:  WAL ends before end time of backup dump
> > 
> > I think you're right. We need a couple of changes to avoid confusing
> > messages.
> 
> Hmm, we could update minSafeStartPoint in XLogFlush instead. That was 
> suggested when the idea of minSafeStartPoint was first thought of. 
> Updating minSafeStartPoint is analogous to flushing WAL: 
> minSafeStartPoint must be advanced to X before we can flush a data pgse 
> with LSN X. To avoid excessive controlfile updates, whenever we update 
> minSafeStartPoint, we can update it to the latest WAL record we've read.
> 
> Or we could simply ignore that error if we've reached minSafeStartPoint 
> - 1 segment, assuming that even though minSafeStartPoint is higher, we 
> can't have gone past the end of valid WAL records in the last segment in 
> previous recovery either. But that feels more fragile.

My proposed fix for Fujii-san's minSafeStartPoint bug is to introduce
another control file state DB_IN_ARCHIVE_RECOVERY_BASE. This would show
that we are still recovering up to the point of the end of the base
backup. Once we reach minSafeStartPoint we then switch state to
DB_IN_ARCHIVE_RECOVERY, and set baseBackupReached boolean, which then
enables writing new minSafeStartPoints when we open new WAL files in the
future. 

We then have messages only when in DB_IN_ARCHIVE_RECOVERY_BASE state

  if (XLByteLT(EndOfLog, ControlFile->minRecoveryPoint) &&
      ControlFile->state == DB_IN_ARCHIVE_RECOVERY_BASE)
  {
    if (reachedStopPoint) /* stopped because of stop request */
      ereport(FATAL,
          (errmsg("requested recovery stop point is before end time of
backup dump")));
    else /* ran off end of WAL */
        ereport(FATAL,
        (errmsg("WAL ends before end time of backup dump")));
  }

-- 
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


In response to

Responses

pgsql-hackers by date

Next:From: Heikki LinnakangasDate: 2009-01-29 10:22:19
Subject: Re: Hot standby, recovery infra
Previous:From: Greg StarkDate: 2009-01-29 09:49:42
Subject: Re: pg_upgrade project status

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group