Re: BUG #15346: Replica fails to start after the crash

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15346: Replica fails to start after the crash
Date: 2018-08-29 12:10:51
Message-ID: 20180829121051.GC5903@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, Aug 29, 2018 at 08:59:16AM +0200, Alexander Kukushkin wrote:
> Why the block 72478 of index relfile doesn't meet our expectations
> (contains so few tuples)?
> The answer to this question is in the page header. LSN, written in the
> indexpage header is AB3/56BF3B68.
> That has only one meaning, while the postgres was working before the
> crash it managed to apply WAL stream til at least AB3/56BF3B68, what
> is far ahead of "Minimum recovery ending location: AB3/4A1B3118".

Yeah, that's the pinpoint. Do you know by chance what was the content
of the control file for each standby you have upgraded to 9.6.10 before
starting them with the new binaries? You mentioned a cluster of three
nodes, so I guess that you have two standbys, and that one of them did
not see the symptoms discussed here, while the other saw them. Do you
still have the logs of the recovery just after starting the other
standby with 9.4.10 which did not see the symptom? All your standbys
are using the background worker which would cause the btree deletion
code to be scanned, right?

I am trying to work on a reproducer with a bgworker starting once
recovery has been reached, without success yet. Does your cluster
generate some XLOG_PARAMETER_CHANGE records? In some cases, 9.4.8 could
have updated minRecoveryPoint to go backward, which is something that
8d68ee6 has been working on addressing.

Did you also try to use local WAL segments up where AB3/56BF3B68 is
applied, and also have a restore_command so as extra WAL segment fetches
from the archive would happen?
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Kukushkin 2018-08-29 13:05:25 Re: BUG #15346: Replica fails to start after the crash
Previous Message Andrew Gierth 2018-08-29 11:02:46 Re: BUG #15352: postgresql FDW error "ERROR: ORDER BY position 0 is not in select list"

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-08-29 12:46:02 Re: some pg_dump query code simplification
Previous Message Yugo Nagata 2018-08-29 12:09:03 Re: pg_verify_checksums -d option (was: Re: pg_verify_checksums -r option)