Re: Block-level CRC checks

From: Paul Schlie <schlie(at)comcast(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brian Hurt <bhurt(at)janestcapital(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2008-10-01 16:27:35
Message-ID: C5091D37.14512%schlie@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Paul Schlie writes:
>> - yes, if you're willing to compute true CRC's as opposed to simpler
>> checksums, which may be worth the price if in fact many/most data
>> check failures are truly caused by single bit errors somewhere in the
>> chain,
>
> FWIW, not one of the corrupted-data problems I've investigated has ever
> looked like a single-bit error. So the theoretical basis for using a
> CRC here seems pretty weak. I doubt we'd even consider automatic repair
> attempts anyway.

- although I accept that you may be correct in your assessment that most
errors are in fact multi-bit; I've never seen any hard data to coberate
either this or my suspicion that most errors are in fact single bit in
nature (if occurring within the read/processing/write paths from storage),
but agree that if occurring within an otherwise ECC'd memory subsystem,
would have to be multi-bit in nature; however in systems which record very
low single bit corrected errors, and little if any uncorrectable double bit
errors, it seems unlikely that multi-bit errors resulting from memory
failure can account for the number of integrity check failures for data
stored in file systems; so strongly suspect that of the failures you've
had occasion to investigate, they were predominantly so catastrophic
they were sufficiently obvious to catch your attention, with most having
more subtle integrity errors simply sneaking below the radar. (As it
seems clear that statistically hardware failure will most likely result
in single bit errors being injected into data with greater frequency than
multi-bit ones, and will not be detected unless otherwise provisioned to
be minimally detected, if not corrected at each communication boundary the
data traverses).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Mielke 2008-10-01 17:00:36 Re: Block-level CRC checks
Previous Message Aidan Van Dyk 2008-10-01 16:22:40 Re: Block-level CRC checks