Re: BUG #15346: Replica fails to start after the crash

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>,Dmitry Dolgov <9erthalion6(at)gmail(dot)com>,pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15346: Replica fails to start after the crash
Date: 2018-08-29 06:17:23
Message-ID: 20180829061723.GA5903@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-bugs pgsql-hackers

On Sat, Aug 25, 2018 at 09:54:39AM +0200, Alexander Kukushkin wrote:
> Is there a way to recover from such a situation? Should the postgres
> in such case do comparison of LSNs and if the LSN on the page is
> higher than the current LSN simply return InvalidTransactionId?

Hmm. That does not sound right to me. If the page has a header LSN
higher than the one replayed, we should not see it as consistency has
normally been reached. XLogReadBufferExtended() seems to complain in
your case about a page which should not exist per the information of
your backtrace. What's the length of relation file at this point? If
the relation is considered as having less blocks, shouldn't we just
ignore it if we're trying to delete items on it and return
InvalidTransactionId? I have to admit that I am not the best specialist
with this code.

hblkno looks also unsanely high to me if you look at the other blkno
references you are mentioning upthread.

> Apparently, if there are no connections open postgres simply is not
> running this code and it seems ok.

Yeah, that's used for standby conflicts.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-08-29 06:32:32 Re: "Write amplification" is made worse by "getting tired" while inserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)
Previous Message Yugo Nagata 2018-08-29 04:50:39 Re: Refactor textToQualifiedNameList()

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Kukushkin 2018-08-29 06:59:16 Re: BUG #15346: Replica fails to start after the crash
Previous Message Thomas Munro 2018-08-29 01:38:03 Re: BUG #15350: Getting invalid cache ID: 11 Errors