Re: [BUG?] lag of minRecoveryPont in archive recovery

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG?] lag of minRecoveryPont in archive recovery
Date: 2012-12-11 09:14:18
Message-ID: 50C6F96A.7060305@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11.12.2012 08:07, Kyotaro HORIGUCHI wrote:
>> The cause is that CheckRecoveryConsistency() is called before rm_redo(),
>> as Horiguchi-san suggested upthead. Imagine the case where
>> minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE
>> record. When restarting the server with that minRecoveryPoint,
>> the followings would happen, and then PANIC occurs.
>>
>> 1. XLOG_SMGR_TRUNCATE record is read.
>> 2. CheckRecoveryConsistency() is called, and database is marked as
>> consistent since we've reached minRecoveryPoint (i.e., the location
>> of XLOG_SMGR_TRUNCATE).
>> 3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is
>> found.
>> 4. Since the database has already been marked as consistent, an invalid
>> page leads to PANIC.
>
> Exactly.
>
> In smgr_redo, EndRecPtr which is pointing the record next to
> SMGR_TRUNCATE, is used as the new minRecoveryPoint. On the other
> hand, during the second startup of the standby,
> CheckRecoveryConsistency checks for consistency by
> XLByteLE(minRecoveryPoint, EndRecPtr) which should be true at
> just BEFORE the SMGR_TRUNCATE record is applied. So
> reachedConsistency becomes true just before the SMGR_TRUNCATE
> record will be applied. Bang!

Ah, I see. I thought moving CheckRecoveryConsistency was just a
nice-to-have thing, so that we'd notice that we're consistent earlier,
so didn't pay attention to that part.

I think the real issue here is that CheckRecoveryConsistency should not
be comparing minRecoveryPoint against recoveryLastRecPtr, not EndRecPtr
as it currently does. EndRecPtr points to the end of the last record
*read*, while recoveryLastRecPtr points to the end of the last record
*replayed*. The latter is what CheckRecoveryConsistency should be
concerned with.

BTW, it occurs to me that we have two variables in the shared struct
that are almost the same thing: recoveryLastRecPtr and replayEndRecPtr.
The former points to the end of the last record successfully replayed,
while replayEndRecPtr points to the end of the last record successfully
replayed, or begin replayed at the moment. I find that confusing, so I
suggest that we rename recoveryLastRecPtr to make that more clear.
Included in the attached patch.

- Heikki

Attachment Content-Type Size
fix-check-recovery-consistency-1.patch text/x-diff 3.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2012-12-11 09:23:36 Re: too much pgbench init output
Previous Message Tom Lane 2012-12-11 07:12:17 Re: allowing multiple PQclear() calls