Re: BUG #15346: Replica fails to start after the crash

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15346: Replica fails to start after the crash
Date: 2018-08-30 18:57:05
Message-ID: 20180830185705.GF15446@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Aug 30, 2018 at 08:31:36PM +0200, Alexander Kukushkin wrote:
> 2018-08-30 19:34 GMT+02:00 Michael Paquier <michael(at)paquier(dot)xyz>:
>> I have been struggling for a couple of hours to get a deterministic test
>> case out of my pocket, and I did not get one as you would need to get
>> the bgwriter to flush a page before crash recovery finishes, we could do
>
> In my case the active standby server has crashed, it wasn't in the
> crash recovery mode.

That's what I meant, a standby crashed and then restarted, doing crash
recovery before moving on with archive recovery once it was done with
all its local WAL.

> Minimum recovery ending location is AB3/4A1B3118, but at the same time
> I managed to find pages from 0000000500000AB300000053 on disk (at
> least in the index files). That could only mean that bgwriter was
> flushing dirty pages, but pg_control wasn't properly updated and it
> happened not during recovery after hardware crash, but while the
> postgres was running before the hardware crash.

Exactly, that would explain the incorrect reference.

> The only possible way to recover such standby - cut off all possible
> connections and let it replay all WAL files it managed to write to
> disk before the first crash.

Yeah... I am going to apply the patch after another lookup, that will
fix the problem moving forward. Thanks for checking by the way.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2018-08-31 00:48:46 Re: BUG #15346: Replica fails to start after the crash
Previous Message Alexander Kukushkin 2018-08-30 18:31:36 Re: BUG #15346: Replica fails to start after the crash

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-08-30 19:17:25 Re: Online verification of checksums
Previous Message Alexander Kukushkin 2018-08-30 18:31:36 Re: BUG #15346: Replica fails to start after the crash