Re: regression test failed when enabling checksum

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: regression test failed when enabling checksum
Date: 2013-04-02 01:53:09
Message-ID: 1364867589.7580.296.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2013-04-01 at 10:37 -0700, Jeff Janes wrote:

> Over 10,000 cycles of crash and recovery, I encountered two cases of
> checksum failures after recovery, example:
>
>
> 14264 SELECT 2013-03-28 13:08:38.980 PDT:WARNING: page verification
> failed, calculated checksum 7017 but expected 1098
> 14264 SELECT 2013-03-28 13:08:38.980 PDT:ERROR: invalid page in block
> 77 of relation base/16384/2088965
>
> 14264 SELECT 2013-03-28 13:08:38.980 PDT:STATEMENT: select sum(count)
> from foo

It would be nice to know whether that's an index or a heap page.

>
> In both cases, the bad block (77 in this case) is the same block that
> was intentionally partially-written during the "crash". However, that
> block should have been restored from the WAL FPW, so its fragmented
> nature should not have been present in order to be detected. Any idea
> what is going on?

Not right now. My primary suspect is what's going on in
visibilitymap_set() and heap_xlog_visible(), which is more complex than
some of the other code. That would require some VACUUM activity, which
isn't in your workload -- do you think autovacuum may kick in sometimes?

Thank you for testing! I will try to reproduce it, as well.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2013-04-02 02:51:19 regression test failed when enabling checksum
Previous Message Brendan Jurd 2013-04-02 01:15:20 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)