Re: Block-level CRC checks

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Paul Schlie <schlie(at)comcast(dot)net>, Brian Hurt <bhurt(at)janestcapital(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Block-level CRC checks
Date: 2008-10-01 17:07:22
Message-ID: 48E3AE4A.6050605@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Paul Schlie <schlie(at)comcast(dot)net> writes:
>
>> - yes, if you're willing to compute true CRC's as opposed to simpler
>> checksums, which may be worth the price if in fact many/most data
>> check failures are truly caused by single bit errors somewhere in the
>> chain,
>>
>
> FWIW, not one of the corrupted-data problems I've investigated has ever
> looked like a single-bit error. So the theoretical basis for using a
> CRC here seems pretty weak. I doubt we'd even consider automatic repair
> attempts anyway.
>

Single bit failures are probably the most common, but they are probably
already handled by the hardware. I don't think I've ever seen a modern
hard drive return a wrong bit - I get short reads first. By the time
somebody notices a problem, it's probably more than a few bits that have
accumulated. For example, if memory has a faulty cell in it, it will
create a fault a percentage of every time it is accessed. One bit error
easily turns into two, three, ... Then there is the fact that no
hardware is perfect, and every single component in the computer has a
chance, however small, of introducing bit errors... :-(

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-10-01 17:25:52 Re: Block-level CRC checks
Previous Message Mark Mielke 2008-10-01 17:00:36 Re: Block-level CRC checks