Quick Links

Re: Block-level CRC checks

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Block-level CRC checks
Date:	2009-12-01 14:40:53
Message-ID:	4B152AF5.5020703@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Robert Haas wrote:
> On Tue, Dec 1, 2009 at 8:30 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Bruce Momjian wrote:
>>> What might be interesting is to report CRC mismatches if the database
>>> was shut down cleanly previously; I think in those cases we shouldn't
>>> have torn pages.
>> Unfortunately that's not true. You can crash, leading to a torn page,
>> and then start up the database and shut it down cleanly. The torn page
>> is still there, even though the last shutdown was a clean one.
>
> Thinking through this, as I understand it, in order to prevent this
> problem, you'd need to be able to predict at recovery time which pages
> might have been torn by the unclean shutdown. In order to do that,
> you'd need to know which pages were waiting to be written to disk at
> the time of the shutdown. For ordinary page modifications, that's not
> a problem, because there will be WAL records for those pages that need
> to be replayed, and we could recompute the CRC at the same time. But
> for hint bit changes, there's no persistent state that would tell us
> which hint bits were in the midst of being flipped when the system
> went down, so the only way to make sure all the CRCs are correct would
> be to rescan every page in the entire cluster and recompute every CRC.
>
> Is that right?

Yep.

Even if rescanning every page in the cluster was feasible from a
performance point-of-view, it would make the CRC checking a lot less
useful. It's not hard to imagine that when a hardware glitch happens
causing corruption, it also causes the system to crash. Recalculating
the CRCs after crash would mask the corruption.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Re: Block-level CRC checks at 2009-12-01 14:26:22 from Robert Haas

Responses

Re: Block-level CRC checks at 2009-12-01 14:46:48 from Robert Haas
Re: Block-level CRC checks at 2009-12-01 15:35:13 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2009-12-01 14:42:28	Re: CommitFest status/management
Previous Message	Andres Freund	2009-12-01 14:38:41	Re: Block-level CRC checks