Re: PANIC during crash recovery of a recently promoted standby

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(at)paquier(dot)xyz
Cc: pavan(dot)deolasee(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PANIC during crash recovery of a recently promoted standby
Date: 2018-06-22 05:34:02
Message-ID: 20180622.143402.131885418.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, sorry for the absense and I looked the second patch.

At Fri, 22 Jun 2018 13:45:21 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20180622044521(dot)GC5215(at)paquier(dot)xyz>
> On Fri, Jun 22, 2018 at 10:08:24AM +0530, Pavan Deolasee wrote:
> > On Fri, Jun 22, 2018 at 9:28 AM, Michael Paquier <michael(at)paquier(dot)xyz>
> > wrote:
> >> So an extra pair of eyes from another committer would be
> >> welcome. I am letting that cool down for a couple of days now.
> >
> > I am not a committer, so don't know if my pair of eyes count, but FWIW the
> > patch looks good to me except couple of minor points.
>
> Thanks for grabbing some time, Pavan. Any help is welcome!

in previous mail:
> I have spotted two
> bug where I think the problem is not fixed: when replaying a WAL record
> XLOG_PARAMETER_CHANGE, minRecoveryPoint and minRecoveryPointTLI would
> still get updated from the control file values which can still lead to
> failures as CheckRecoveryConsistency could still happily trigger a
> PANIC, so I think that we had better maintain those values consistent as

The fix of StartupXLOG, CheckRecoveryConsistency, ReadRecrod and
xlog_redo looks (functionally, mendtioned below) fine.

> long as crash recovery runs. And XLogNeedsFlush() also has a similar
> problem.

Here, on the other hand, this patch turns off
updateMinRecoverypoint if minRecoverPoint is invalid when
RecoveryInProgress() == true. Howerver RecovInProg() == true
means archive recovery is already started and and
minRecoveryPoint *should* be updated t for the
condition. Actually minRecoverypoint is updated just below. If
this is really right thing, I think that some explanation for the
reason is required here.

In xlog_redo there still be "minRecoverypoint != 0", which ought
to be described as "!XLogRecPtrIsInvalid(minRecoveryPoint)". (It
seems the only one. Double negation is a bit uneasy but there are
many instance of this kind of coding.)

# I'll go all-out next week.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2018-06-22 05:45:44 Re: Server crashed with TRAP: FailedAssertion("!(parallel_workers > 0)" when partitionwise_aggregate true.
Previous Message Tsunakawa, Takayuki 2018-06-22 05:31:44 RE: Threat models for DB cryptography (Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key) Management Service (KMS)