Re: BUG #15346: Replica fails to start after the crash

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15346: Replica fails to start after the crash
Date: 2018-08-29 13:05:25
Message-ID: CAFh8B==_6cY1f7rF4oxK+wjpWKaYzs96Q6Tn5=4QRWJiVnjmDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

2018-08-29 14:10 GMT+02:00 Michael Paquier <michael(at)paquier(dot)xyz>:
> Yeah, that's the pinpoint. Do you know by chance what was the content
> of the control file for each standby you have upgraded to 9.6.10 before
> starting them with the new binaries? You mentioned a cluster of three

No, I don't. Right after the upgrade they started normally and have
been working for a few days. I believe the controlfile was overwritten
a few hundred times before the instance crashed.

> nodes, so I guess that you have two standbys, and that one of them did
> not see the symptoms discussed here, while the other saw them. Do you

The other node didn't crash and still working.

> still have the logs of the recovery just after starting the other
> standby with 9.4.10 which did not see the symptom? All your standbys

I don't think it is really related to the minor upgrade. After the
upgrade the whole cluster was running for about 3 days.
Every day it generates about 2000 WAL segments, the total volume of
daily WALs is very close to the size of cluster, which is 38GB.

> are using the background worker which would cause the btree deletion
> code to be scanned, right?

Well, any open connection to the database will produce the same
result. In our case we are using Patroni for automatic failover, which
connects immediately after postgres has started and keeps this
connection permanently open. Background worker just appeared to be
faster than anything else.

> I am trying to work on a reproducer with a bgworker starting once
> recovery has been reached, without success yet. Does your cluster
> generate some XLOG_PARAMETER_CHANGE records? In some cases, 9.4.8 could
> have updated minRecoveryPoint to go backward, which is something that
> 8d68ee6 has been working on addressing.

No, it doesn't.

>
> Did you also try to use local WAL segments up where AB3/56BF3B68 is
> applied, and also have a restore_command so as extra WAL segment fetches
> from the archive would happen?

If there are no connections open, it applies a necessary amount of WAL
segments (with the help of restore_command off course) and reaches the
real consistency. After that, it is possible to connect and it doesn't
startup process doesn't abort anymore.

BTW, I am thinking that we should return InvalidTransactionId from the
btree_xlog_delete_get_latestRemovedXid if the index page we read from
disk is newer then xlog record we are currently processing. Please see
the patch attached.

--
Alexander Kukushkin

Attachment Content-Type Size
nbtxlog.c.patch text/x-patch 607 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-08-29 20:54:36 Re: BUG #15346: Replica fails to start after the crash
Previous Message Michael Paquier 2018-08-29 12:10:51 Re: BUG #15346: Replica fails to start after the crash

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-08-29 13:08:46 Re: some more error location support
Previous Message Andrew Dunstan 2018-08-29 12:51:26 Re: some pg_dump query code simplification