Re: Block-level CRC checks

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 15:55:54
Message-ID: 1801.1259682954@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote:
>> It's not hard to imagine that when a hardware glitch happens
>> causing corruption, it also causes the system to crash. Recalculating
>> the CRCs after crash would mask the corruption.

> They are already masked from us, so continuing to mask those errors
> would not put us in a worse position.

No, it would just destroy a large part of the argument for why this
is worth doing. "We detect disk errors ... except for ones that happen
during a database crash." "Say what?"

The fundamental problem with this is the same as it's been all along:
the tradeoff between implementation work expended, performance overhead
added, and net number of real problems detected (with a suitably large
demerit for actually *introducing* problems) just doesn't look
attractive. You can make various compromises that improve one or two of
these factors at the cost of making the others worse, but at the end of
the day I've still not seen a combination that seems worth doing.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-12-01 16:06:26 Re: Block-level CRC checks
Previous Message Simon Riggs 2009-12-01 15:35:13 Re: Block-level CRC checks