Re: PANIC during crash recovery of a recently promoted standby

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(at)paquier(dot)xyz
Cc: pavan(dot)deolasee(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PANIC during crash recovery of a recently promoted standby
Date: 2018-06-07 10:58:29
Message-ID: 20180607.195829.164100769.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Thu, 24 May 2018 16:57:07 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20180524075707(dot)GE15445(at)paquier(dot)xyz>
> On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote:
> > Looks like I didn't understand Alvaro's comment when he mentioned it to me
> > off-list. But I now see what Michael and Alvaro mean and that indeed seems
> > like a problem. I was thinking that the test for (ControlFile->state ==
> > DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated
> > after the standby is promoted. While that's true for a DB_IN_PRODUCTION, the
> > RestartPoint may finish after we have written end-of-recovery record, but
> > before we're in production and thus the minRecoveryPoint may again be set.
>
> Yeah, this has been something I considered as well first, but I was not
> confident enough that setting up minRecoveryPoint to InvalidXLogRecPtr
> was actually a safe thing for timeline switches.
>
> So I have spent a good portion of today testing and playing with it to
> be confident enough that this was right, and I have finished with the
> attached. The patch adds a new flag to XLogCtl which marks if the
> control file has been updated after the end-of-recovery record has been
> written, so as minRecoveryPoint does not get updated because of a
> restart point running in parallel.
>
> I have also reworked the test case you sent, removing the manuals sleeps
> and replacing them with correct wait points. There is also no point to
> wait after promotion as pg_ctl promote implies a wait. Another
> important thing is that you need to use wal_log_hints = off to see a
> crash, which is something that allows_streaming actually enables.
>
> Comments are welcome.

As the result of some poking around, my dignosis is different
from yours.

(I believe that) By definition recovery doesn't end until the
end-of-recovery check point ends so from the viewpoint I think it
is wrong to clear ControlFile->minRecoveryPoint before the end.

Invalid-page checking during crash recovery is hamful rather than
useless. It is done by CheckRecoveryConsistency even in crash
recovery against its expectation because there's a case where
minRecoveryPoint is valid but InArchiveRecovery is false. The two
variable there seems to be in contradiction with each other.

The immediate cause of the contradition is that StartXLOG wrongly
initializes local minRecoveryPoint from control file's value even
under crash recovery. updateMinRecoveryPoint also should be
turned off during crash recovery. The case of crash after last
checkpoint end has been treated in UpdateMinRecoveryPoint but
forgot to consider this case, where crash recovery has been
started while control file is still holding valid
minRecoveryPoint.

Finally, the attached patch seems fixing the issue. It passes
015_promotion.pl and the problem doesn't happen with
014_promotion_bug.pl. Also this passes the existing tests
002-014.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Fix-wrong-behavior-during-crash-recovery.patch text/x-patch 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-06-07 12:41:23 Re: Typo in planner README
Previous Message Ashutosh Bapat 2018-06-07 10:42:58 Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.