Re: BUG #15346: Replica fails to start after the crash

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(at)paquier(dot)xyz
Cc: cyberdemn(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, alvherre(at)2ndquadrant(dot)com, 9erthalion6(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15346: Replica fails to start after the crash
Date: 2018-08-31 00:48:46
Message-ID: 20180831.094846.52751456.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

At Thu, 30 Aug 2018 11:57:05 -0700, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20180830185705(dot)GF15446(at)paquier(dot)xyz>
> On Thu, Aug 30, 2018 at 08:31:36PM +0200, Alexander Kukushkin wrote:
> > 2018-08-30 19:34 GMT+02:00 Michael Paquier <michael(at)paquier(dot)xyz>:
> >> I have been struggling for a couple of hours to get a deterministic test
> >> case out of my pocket, and I did not get one as you would need to get
> >> the bgwriter to flush a page before crash recovery finishes, we could do
> >
> > In my case the active standby server has crashed, it wasn't in the
> > crash recovery mode.
>
> That's what I meant, a standby crashed and then restarted, doing crash
> recovery before moving on with archive recovery once it was done with
> all its local WAL.
>
> > Minimum recovery ending location is AB3/4A1B3118, but at the same time
> > I managed to find pages from 0000000500000AB300000053 on disk (at
> > least in the index files). That could only mean that bgwriter was
> > flushing dirty pages, but pg_control wasn't properly updated and it
> > happened not during recovery after hardware crash, but while the
> > postgres was running before the hardware crash.
>
> Exactly, that would explain the incorrect reference.
>
> > The only possible way to recover such standby - cut off all possible
> > connections and let it replay all WAL files it managed to write to
> > disk before the first crash.
>
> Yeah... I am going to apply the patch after another lookup, that will
> fix the problem moving forward. Thanks for checking by the way.

Please wait a bit.. I have a concern about this.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-08-31 01:48:55 Re: BUG #15346: Replica fails to start after the crash
Previous Message Michael Paquier 2018-08-30 18:57:05 Re: BUG #15346: Replica fails to start after the crash

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-08-31 01:48:55 Re: BUG #15346: Replica fails to start after the crash
Previous Message Michael Paquier 2018-08-31 00:06:00 Re: pg_verify_checksums and -fno-strict-aliasing