Re: emergency outage requiring database restart

From: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: emergency outage requiring database restart
Date: 2016-10-27 07:31:18
Message-ID: CA+CSw_v2moKAxgfkVZOitXU3EJubmqRnAkbarzLue+n-h0Pj+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 26, 2016 at 8:43 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> /var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page
> verification"
> 2016-10-26 11:26:42 CDT [postgres(at)castaging]: WARNING: page
> verification failed, calculated checksum 37251 but expected 37244
> 2016-10-26 11:27:55 CDT [postgres(at)castaging]: WARNING: page
> verification failed, calculated checksum 37249 but expected 37244
> 2016-10-26 12:16:44 CDT [postgres(at)castaging]: WARNING: page
> verification failed, calculated checksum 44363 but expected 44364
> 2016-10-26 12:18:58 CDT [postgres(at)castaging]: WARNING: page
> verification failed, calculated checksum 49525 but expected 49539
> 2016-10-26 12:19:12 CDT [postgres(at)castaging]: WARNING: page
> verification failed, calculated checksum 37345 but expected 37340

The checksum values are improbably close. The checksum algorithm has
decently good mixing of all bits in the page. Having the first byte
match in 5 checksums makes this 1:2^40 improbable. What is not mixed
in properly is the block number, it only gets xor'ed before packing
the value into 16bits using modulo 0xFFFF. So I'm pretty sure
different block numbers were used for writing out and reading in the
page. Either the blocknum gets corrupted between calculating the
checksum and writing the page out (unlikely given the proximity), or
the pages are somehow getting transposed in the storage.

Regards,
Ants Aasma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-10-27 08:08:47 Re: WAL consistency check facility
Previous Message Kyotaro HORIGUCHI 2016-10-27 07:23:37 Re: Radix tree for character conversion