Re: "inconsistent page found" with checksum and wal_consistency_checking enabled

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "inconsistent page found" with checksum and wal_consistency_checking enabled
Date: 2017-09-20 04:52:15
Message-ID: CAB7nPqReLDKV2VGHnEEh2iGY9A+BSP8FUV7GG9RNgu2iHcHznA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal <aagrawal(at)pivotal(dot)io> wrote:
> Currently, page checksum is not masked by Page masking routines used by
> wal_consistency_checking facility. So, when running `make installcheck` with
> data checksum enabled and wal_consistency_checking='all', it easily and
> consistently FATALs with "inconsistent page found".

Indeed. This had better be fixed before PG10 is out. I am adding an open item.

> If anything needs to be masked on Page to perform / pass wal consistency
> checking, definitely checksum is not going to match and hence must be masked
> as well. Attaching patch to fix the same, installcheck passes with checksums
> enabled and wal_consistency_checking='all' with the fix.
>
> Clubbed to perform the masking with lsn as it sounds logical to have them
> together, as lsn is masked is all the cases so far and such is needed for
> checksum as well.

Agreed.

* In consistency checks, the LSN of the two pages compared will likely be
- * different because of concurrent operations when the WAL is generated
- * and the state of the page when WAL is applied.
+ * different because of concurrent operations when the WAL is generated and
+ * the state of the page when WAL is applied. Also, mask out checksum as
+ * masking anything else on page means checksum is not going to match as well.
*/
Nit: Using "the LSN and the checksum" instead of the "the LSN".
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-09-20 04:52:17 Re: src/test/subscription/t/002_types.pl hanging on particular environment
Previous Message Andres Freund 2017-09-20 04:42:15 Re: [HACKERS] Re: pgsql: Make new crash restart test a bit more robust.