Re: emergency outage requiring database restart

From: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Oskari Saarenmaa <os(at)ohmu(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: emergency outage requiring database restart
Date: 2017-01-18 10:11:12
Message-ID: CA+CSw_swdjQ8H=sUq9+c5Ymy6qB3GdTYVBuxmmnZPXRJwzLf7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 4, 2017 at 5:36 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> Still getting checksum failures. Over the last 30 days, I see the
> following. Since enabling checksums FWICT none of the damage is
> permanent and rolls back with the transaction. So creepy!

The checksums still only differ in least significant digits which
pretty much means that there is a block number mismatch. So if you
rule out filesystem not doing its job correctly and transposing
blocks, it could be something else that is resulting in blocks getting
read from a location that happens to differ by a small multiple of
page size. Maybe somebody is racily mucking with table fd's between
seeking and reading. That would explain the issue disappearing after a
retry.

Maybe you can arrange for the RelFileNode and block number to be
logged for the checksum failures and check what the actual checksums
are in data files surrounding the failed page. If the requested block
number contains something completely else, but the page that follows
contains the expected checksum value, then it would support this
theory.

Regards,
Ants Aasma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-01-18 10:13:50 Re: Add pgstathashindex() to get hash index table statistics.
Previous Message Ashutosh Sharma 2017-01-18 09:54:22 Re: pageinspect: Hash index support