Re: [BUG?] lag of minRecoveryPont in archive recovery

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG?] lag of minRecoveryPont in archive recovery
Date: 2012-12-11 15:04:34
Message-ID: CAHGQGwHB-ToOv1W52YRBX9+5vW2B5BvHad0XjfW6FfYS=5F-=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 11, 2012 at 6:14 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 11.12.2012 08:07, Kyotaro HORIGUCHI wrote:
>>>
>>> The cause is that CheckRecoveryConsistency() is called before rm_redo(),
>>> as Horiguchi-san suggested upthead. Imagine the case where
>>> minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE
>>> record. When restarting the server with that minRecoveryPoint,
>>> the followings would happen, and then PANIC occurs.
>>>
>>> 1. XLOG_SMGR_TRUNCATE record is read.
>>> 2. CheckRecoveryConsistency() is called, and database is marked as
>>> consistent since we've reached minRecoveryPoint (i.e., the location
>>> of XLOG_SMGR_TRUNCATE).
>>> 3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is
>>> found.
>>> 4. Since the database has already been marked as consistent, an invalid
>>> page leads to PANIC.
>>
>>
>> Exactly.
>>
>> In smgr_redo, EndRecPtr which is pointing the record next to
>> SMGR_TRUNCATE, is used as the new minRecoveryPoint. On the other
>> hand, during the second startup of the standby,
>> CheckRecoveryConsistency checks for consistency by
>> XLByteLE(minRecoveryPoint, EndRecPtr) which should be true at
>> just BEFORE the SMGR_TRUNCATE record is applied. So
>> reachedConsistency becomes true just before the SMGR_TRUNCATE
>> record will be applied. Bang!
>
>
> Ah, I see. I thought moving CheckRecoveryConsistency was just a nice-to-have
> thing, so that we'd notice that we're consistent earlier, so didn't pay
> attention to that part.
>
> I think the real issue here is that CheckRecoveryConsistency should not be
> comparing minRecoveryPoint against recoveryLastRecPtr, not EndRecPtr as it
> currently does. EndRecPtr points to the end of the last record *read*, while
> recoveryLastRecPtr points to the end of the last record *replayed*. The
> latter is what CheckRecoveryConsistency should be concerned with.
>
> BTW, it occurs to me that we have two variables in the shared struct that
> are almost the same thing: recoveryLastRecPtr and replayEndRecPtr. The
> former points to the end of the last record successfully replayed, while
> replayEndRecPtr points to the end of the last record successfully replayed,
> or begin replayed at the moment. I find that confusing, so I suggest that we
> rename recoveryLastRecPtr to make that more clear. Included in the attached
> patch.

The patch looks good. I confirmed that the PANIC error didn't happen,
with the patch.

xlog.c
> * Initialize shared replayEndRecPtr, recoveryLastRecPtr, and

s/recoveryLastRecPtr/lastReplayedEndRecPtr

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-12-11 15:06:50 Re: [WIP] pg_ping utility
Previous Message Andres Freund 2012-12-11 14:47:10 Re: pg_dump cosmetic problem while dumping/restoring rules