Re: PANIC during crash recovery of a recently promoted standby

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(at)paquier(dot)xyz
Cc: pavan(dot)deolasee(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PANIC during crash recovery of a recently promoted standby
Date: 2018-07-02 07:25:13
Message-ID: 20180702.162513.154695753.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Fri, 22 Jun 2018 15:25:48 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20180622062548(dot)GE5215(at)paquier(dot)xyz>
> On Fri, Jun 22, 2018 at 02:34:02PM +0900, Kyotaro HORIGUCHI wrote:
> > Hello, sorry for the absense and I looked the second patch.
>
> Thanks for the review!
>
> > At Fri, 22 Jun 2018 13:45:21 +0900, Michael Paquier
> > <michael(at)paquier(dot)xyz> wrote in <20180622044521(dot)GC5215(at)paquier(dot)xyz>
> >> long as crash recovery runs. And XLogNeedsFlush() also has a similar
> >> problem.
> >
> > Here, on the other hand, this patch turns off
> > updateMinRecoverypoint if minRecoverPoint is invalid when
> > RecoveryInProgress() == true. Howerver RecovInProg() == true
> > means archive recovery is already started and and
> > minRecoveryPoint *should* be updated t for the
> > condition. Actually minRecoverypoint is updated just below. If
> > this is really right thing, I think that some explanation for the
> > reason is required here.
>
> LocalRecoveryInProgress is just a local copy of SharedRecoveryInProgress
> so RecoveryInProgress also returns true if crash recovery is running.
> But perhaps I am missing what you mean? The point here is that redo can
> call XLogNeedsFlush, no?

My concern at the time was the necessity to turn off
updateMinRecoveryPoint on the fly. (The previous comment seems a
bit confused, sorry.)

When minRecoveryPoint is invalid, there're only two possible
cases. It may be at very beginning of archive reovery or may be
running a crash recovery. In the latter case, we have detected
crash recovery before redo starts. So we can turn off
updateMinRecoveryPoint immediately and no further check is
needed and it is (I think) easier to understand.

> > In xlog_redo there still be "minRecoverypoint != 0", which ought
> > to be described as "!XLogRecPtrIsInvalid(minRecoveryPoint)". (It
> > seems the only one. Double negation is a bit uneasy but there are
> > many instance of this kind of coding.)
>
> It is possible to use directly a comparison with InvalidXLogRecPtr
> instead of a double negation.

I'm not sure whether it is abstraction of invalid value, or just
a short cut of the value. That's right if it's the
latter. (There's several places where invalid LSN is assumed to
be smaller than any valid values in the patch).

the second diff is the difference of the first patch from
promote_panic_master.diff

On further thought, as we confirmed upthread (and existing
comments are saying) that (minRecoveryPoint == 0)
!InArchiveRecovery are always equivalent, and
updateMinRecoveryPoint becomes equivalent to them by the v3
patch. That is, we can just remove the variable and the attached
third patch is that. It also passes all recovery tests including
the new 015.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
promote-panic_v3.patch text/x-patch 5.9 KB
diff_from_promote_panic_master.diff text/x-patch 1.1 KB
promote-panic_horiguti_v4.patch text/x-patch 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-07-02 07:31:32 Re: Copy function for logical replication slots
Previous Message Peter Eisentraut 2018-07-02 07:13:50 Re: [PATCH] Include application_name in "connection authorized" log message