Re: 9.4 checksum errors in recovery with gin index

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.4 checksum errors in recovery with gin index
Date: 2014-05-07 17:21:26
Message-ID: CAMkU=1weqUAVPW2F+c3Ok5VFfvEjyfJXb4dH8B7v57D5WXftkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Hi,
>
> On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> > When recovering from a crash (with injection of a partial page write at
> > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> > verification failure.
> >
> > 16396 is a gin index.
>
> Over which type? What was the load? make check?
>

A gin index on text[].

The load is a variation of the crash recovery tester I've been using the
last few years, this time adapted to use a gin index in a rather unnatural
way. I just increment a counter on a random row repeatedly via a unique
key, but for this purpose that unique key is stuffed into text[] along with
a bunch of cruft. The cruft is text representations of negative integers,
the actual key is text representation of nonnegative integers.

The test harness (patch to induce crashes, and two driving programs) and a
preserved data directory are here:

https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing

(role jjanes, database jjanes)

As far as I can tell, this problem goes back to the beginning of page
checksums.

> > If I have it ignore checksum failures, there is no apparent misbehavior.
> > I'm trying to bisect it, but it could take a while and I thought someone
> > might have some theories based on the log:
>
> If you have the WAL a pg_xlogdump grepping for everything referring to
> that block would be helpful.
>

The only record which mentions block 28486 by name is this one:

rmgr: Gin len (rec/tot): 1576/ 1608, tx: 77882205, lsn:
11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node:
1663/16384/16396 blkno: 28486

However, I think that that record precedes the recovery start point.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-05-07 17:21:45 Re: PGDLLEXPORTing all GUCs?
Previous Message Tom Lane 2014-05-07 17:08:52 Re: PGDLLEXPORTing all GUCs?