Re: [BUG] Archive recovery failure on 9.3+.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hlinnakangas(at)vmware(dot)com
Cc: katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] Archive recovery failure on 9.3+.
Date: 2014-02-14 08:38:57
Message-ID: 20140214.173857.65272356.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Before taking up the topic..

At Thu, 13 Feb 2014 19:45:38 +0200, Heikki Linnakangas wrote
> On 02/13/2014 06:47 PM, Heikki Linnakangas wrote:
> > On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:
> >> The behavior where we prefer a segment from archive with lower TLI
> >> over
> >> a file with higher TLI in pg_xlog actually changed in commit
> >> a068c391ab0. Arguably changing it wasn't a good idea, but the problem
> >> your test script demonstrates can be fixed by not archiving the
> >> partial
> >> segment, with no change to the preference of archive/pg_xlog. As
> >> discussed, archiving a partial segment seems like a bad idea anyway,
> >> so
> >> let's just stop doing that.

It surely makes things simple and I rather like the idea but as
long as the final and possiblly partial segment of the lower TLI
is actually created and the recovery mechanism allows users to
command recovery operation requires such segments
(recovery_target_timeline does this), a "perfect archive" - which
means an archive which can cover all sorts of restore operatoins
- necessarily may have such duplicate segments, I
believe. Besides, I suppose that policy makes operations around
archive/restore way difficult. DBAs should get stuck with tensive
work of picking only actually needed segments for the recovery
undertaken out of the haystack. It sounds somewhat gloomy..

# However I also doubt the appropriateness of stockpiling archive
# segments spanning over so many timelines, two generations are
# enough to cause this issue.

Anyway, returning to the topic,

> > After some further thought, while not archiving the partial segment
> > fixes your test script, it's not enough to fix all variants of the
> > problem. Even if archive recovery doesn't archive the last, partial,
> > segment, if the original master server is still running, it's entirely
> > possible that it fills the segment and archives it. In that case,
> > archive recovery will again prefer the archived segment with lower TLI
> > over the segment with newer TLI in pg_xlog.

Yes, it is the generalized description of the case I've
mentioned. (Though I've not reached that thought :)

> > So I agree we should commit the patch you posted (or something to that
> > effect). The change to not archive the last segment still seems like a
> > good idea, but perhaps we should only do that in master.

My opinion on duplicate segments on older timelines is as
decribed above.

> To draw this to conclusion, barring any further insights to this, I'm
> going to commit the attached patch to master and REL9_3_STABLE. Please
> have a look at the patch, to see if I'm missing something. I modified
> the state machine to skip over XLOG_FROM_XLOG state, if reading in
> XLOG_FROM_ARCHIVE failed; otherwise you first scan the archive and
> pg_xlog together, and then pg_xlog alone, which is pointless.
>
> In master, I'm also going to remove the "archive last segment on old
> timeline" code.

Thank you for finishing the patch. I didn't think of the behavior
after XLOG_FROM_ARCHIVE failure. It seems that the state machine
will go round getting rid of extra round with it. Recovery
process becomes able to grab the segment on highest (expected)
TLI among those with the same segment id regardless of their
locations. I think the recovery process will cope with "perfect"
archives described above for all types of recovery operation. The
state machine loop considering fallback from archive to pg_xlog
now seems somewhat too complicated than needed but it's also no
harm.

Though, here which was in my original patch,

> readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
> currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY : currentSource);

is sticking far out the line wrapping boundary and seems somewhat
dirty:(

And what the conditional operator seems to make the meaning of
the XLOG_FROM_ARCHIVE and _ANY a bit confused. But I failed to
unify them to any side so it is left as is..

Finally, the patch you will find attached is fixed only in
styling mentioned above from your last patch. This patch applies
current HEAD and I confirmed that it fixes this issue but I have
not checked the lastSourceFailed section. Simple file removal
could not lead to there.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Change-the-order-that-pg_xlog-and-WAL-archive-are-po_r1.patch text/x-patch 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Inoue 2014-02-14 09:02:40 Re: narwhal and PGDLLIMPORT
Previous Message Stephen Frost 2014-02-14 08:28:23 Re: HBA files w/include support?