Re: PANIC during crash recovery of a recently promoted standby

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: pavan(dot)deolasee(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: PANIC during crash recovery of a recently promoted standby
Date: 2018-06-28 01:37:51
Message-ID: 20180628013751.GA11054@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Adding Heikki and Andres in CC here for awareness..

On Wed, Jun 27, 2018 at 05:29:38PM +0900, Michael Paquier wrote:
> I have spent a bit of time testing this on HEAD, 10 and 9.6. For 9.5,
> 9.4 and 9.3 I have reproduced the failure and tested the patch, but I
> lacked time to perform more tests. The patch set for 9.3~9.5 applies
> without conflict across the 3 branches. 9.6 has a conflict in a
> comment, and v10 had an extra comment conflict.
>
> Feel free to have a look, I am not completely done with this stuff and
> I'll work more tomorrow on checking 9.3~9.5.

And I have been able to spend the time I wanted to spend on this patch
series with testing for 9.3 to 9.5. Attached are a couple of patches
you can use to reproduce the failures for all the branches:
- For master and 10, the tests are included in the patch and are
proposed for commit.
- On 9.6, I had to tweak the TAP scripts as pg_ctl start has switched to
use the wait mode by default.
- On 9.5, there is a tweak to src/Makefile.global.in which cleans up
tmp_check, and a couple of GUCs not compatible.
- On 9.4, I had to tweak src/Makefile.global.in so as the temporary
installation path is correct. Again some GUCs had to be tweaked.
- On 9.3, there is no TAP infrastructure, so I tweaked
src/test/recovery/Makefile to be able to run the tests.

I have also created a bash script which emulates what the TAP test does,
which is attached. Because of visibly some timing reasons, I have not
been able to reproduce the problem with it. Anyway, running (and
actually sort of back-porting) the TAP suite so as the problematic test
case can be run is possible with the sets attached and shows the failure
so we can use that.

Thoughts? I would love more input about the patch concept.
--
Michael

Attachment Content-Type Size
promote_panic.bash text/plain 2.8 KB
promote-panic-test-93.tar.gz application/gzip 21.6 KB
promote-panic-test-94.tar.gz application/gzip 23.2 KB
promote-panic-test-95.tar.gz application/gzip 23.1 KB
promote-panic-test-96.tar.gz application/gzip 18.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-06-28 02:04:32 Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
Previous Message Nikita Glukhov 2018-06-28 00:34:38 Re: SQL/JSON: JSON_TABLE