Skip site navigation (1) Skip section navigation (2)

Re: Hot Backup with rsync fails at pg_clog if under load

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-10-25 07:03:31
Message-ID: CA+U5nMJksETN=sJEmu2FqhWNv2Uv63e_9T5OrZEGmviNnWEC5Q@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Mon, Oct 24, 2011 at 7:13 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> I think Simon's theory that we're starting recovery from the wrong place,
> i.e. should start with an earlier WAL location, is probably correct. The
> question is, why?

Err, that's not what I said and I don't mean that. Having said that,
what I said about pg_control being invalid would imply that, so is
wrong also.

We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
being derived later than it should be, which can cause problems if
this then means that whole pages are unitialised in subtrans. The bug
only shows up if you do enough transactions (2048 is always enough) to
move to the next subtrans page between the redo pointer and the
checkpoint record while at the same time we do not have a long running
transaction that spans those two points. That's just enough to happen
reasonably frequently on busy systems and yet just enough to have
slipped through testing.

We must either

1. During CreateCheckpoint() we should derive oldestActiveXid before
we derive the redo location

2. Change the way subtrans pages are initialized during recovery so we
don't rely on oldestActiveXid

I need to think some more before a decision on this in my own mind,
but I lean towards doing (1) as a longer term fix and doing (2) as a
short term fix for existing releases. I expect to have a fix later
today.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

pgsql-hackers by date

Next:From: Fujii MasaoDate: 2011-10-25 08:50:10
Subject: Re: Online base backup from the hot-standby
Previous:From: Heikki LinnakangasDate: 2011-10-25 06:44:30
Subject: Re: Online base backup from the hot-standby

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group