Skip site navigation (1) Skip section navigation (2)

Re: Recovery bug

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Recovery bug
Date: 2010-10-26 07:48:27
Message-ID: 4CC687CB.6080306@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-bugs
On 25.10.2010 19:04, Jeff Davis wrote:
> On Mon, 2010-10-25 at 14:44 +0300, Heikki Linnakangas wrote:
>> It seems we should use ReadRecord instead of the lower-level
>> XLogPageRead function. One difference is that ReadRecord performs a
>> bunch of sanity checks on the record, while XLogPageRead just reads the
>> raw page. Extra sanity checking before removing backup_label seems like
>> a good idea. Another difference is that in standby-mode, ReadRecord will
>> retry until it succeeds. A standby server should keep retrying, even the
>> very first record, until it succeeds, otherwise we have a change in
>> behavior.
>
> The reason I didn't use ReadRecord is because it sets a global variable
> to point to the next location in the log, so that subsequent calls can
> just pass NULL for the location.

True. XLogPageRead is new in 9.0, however. We'll have to use ReadRecord 
or invent something new for back-branches anyway.

> It looks like the patch leaves the global variable pointing just after
> the redo location rather than the checkpoint. I haven't tested your
> patch yet, but it looks like some of the following code depends on
> ReadRecord(NULL,...) fetching the record right after the checkpoint
> record; so I think something else is required if you want to use
> ReadRecord.

Hmm, the next call to ReadRecord is this:

> 		/*
> 		 * Find the first record that logically follows the checkpoint --- it
> 		 * might physically precede it, though.
> 		 */
> 		if (XLByteLT(checkPoint.redo, RecPtr))
> 		{
> 			/* back up to find the record */
> 			record = ReadRecord(&(checkPoint.redo), PANIC, false);
> 		}
> 		else
> 		{
> 			/* just have to read next record after CheckPoint */
> 			record = ReadRecord(NULL, LOG, false);
> 		}

In the first case, the location is given explicitly. In the second case, 
the redo pointer equals the checkpoint record, so the current position 
is correct even with the patch. It makes me slightly nervous, though. 
It's correct today, but if someone adds code between the backup_label 
check and this that assumes that the current position is the checkpoint 
record, it'll fail. Then again, any new ReadRecord call in such added 
code would also break the assumption in the above block that the current 
position is the checkpoint record.

In the case that the redo pointer is the same as the checkpoint record, 
we don't need to re-fetch the checkpoint record. I've added a test for 
that in the attached patch.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment: recovery-redo-3.patch
Description: text/x-diff (1.7 KB)

In response to

Responses

pgsql-bugs by date

Next:From: Vincent MauryDate: 2010-10-26 13:17:36
Subject: Re: BUG #5725: server couldn't start when installing on liveCD
Previous:From: Robert HaasDate: 2010-10-26 02:37:31
Subject: Re: BUG #5725: server couldn't start when installing on liveCD

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group