Re: [HACKERS] Crash on promotion when recovery.conf is renamed

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Crash on promotion when recovery.conf is renamed
Date: 2018-07-09 01:29:55
Message-ID: 20180709012955.GD1467@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Jul 06, 2018 at 09:09:27PM +0900, Michael Paquier wrote:
> Sure. I think I can finish the set on Monday JST then.

So, I have been able to back-patch things down to 9.5, but further down
I am not really convinced that it is worth meddling with that,
particularly in light of 7cbee7c which has reworked the way partial
segments on older timelines are handled at the end of promotion. The
portability issues I have found is related to the timeline number
exitArchiveRecovery uses which comes for the WAL reader hence this gets
set to the timeline from the last page read by the startup process.
This can actually cause quite a bit of trouble at the end of recovery
as we would get a failure when trying to copy the last segment from the
old timeline to the new timeline. Well, it could be possible to fix
things properly by roughly back-porting 7cbee7c down to REL9_4_STABLE
but that's not really worth the risk, and moving exitArchiveRecovery()
and PrescanPreparedTransactions() around is a straight-forward move with
9.5~ thanks to this commit.

I have also done nothing yet for the detection of corrupted 2PC files
which get ignored at recovery. While looking again at the patch I sent
upthread, the thing was actually missing some more error handling in
ReadTwoPhaseFile(). In particular, with the proposed patch we would
still not report an error if a 2PC file cannot be opened because of for
example EPERM. In FinishPreparedTransaction, one would example get a
simply a *crash* with no hints about what happened. I have a patch for
that which still needs some polishing, and I will start a new thread on
the matter.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-07-09 01:34:20 Re: Copy function for logical replication slots
Previous Message David Rowley 2018-07-09 01:26:40 Re: [HACKERS] Secondary index access optimizations