[BUG] Archive recovery failure on 9.3+.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [BUG] Archive recovery failure on 9.3+.
Date: 2013-12-12 02:00:02
Message-ID: 20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, we happened to see server crash on archive recovery under
some condition.

After TLI was incremented, there should be the case that the WAL
file for older timeline is archived but not for that of the same
segment id but for newer timeline. Archive recovery should fail
for the case with PANIC error like follows,

| PANIC: record with zero length at 0/1820D40

Replay script is attached. This issue occured for 9.4dev, 9.3.2,
and not for 9.2.6 and 9.1.11. The latter search pg_xlog for the
TLI before trying archive for older TLIs.

This occurrs during fetching checkpoint redo record in archive
recovery.

> if (checkPoint.redo < RecPtr)
> {
> /* back up to find the record */
> record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false);

And this is caused by that the segment file for older timeline in
archive directory is preferred to that for newer timeline in
pg_xlog.

Looking into pg_xlog before trying the older TLIs in archive like
9.2- fixes this issue. The attached patch is one possible
solution for 9.4dev.

Attached files are,

- recvtest.sh: Replay script. Step 1 and 2 makes the condition
and step 3 causes the issue.

- archrecvfix_20131212.patch: The patch fixes the issue. Archive
recovery reads pg_xlog before trying older TLI in archive
similarly to 9.1- by this patch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
unknown_filename text/plain 1.7 KB
archrecvfix_20131212.patch text/x-patch 849 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-12-12 02:29:43 Re: SSL: better default ciphersuite
Previous Message Andres Freund 2013-12-12 01:23:30 Re: pgsql: Fix a couple of bugs in MultiXactId freezing