Quick Links

Re: Block-level CRC checks

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Paul Schlie <schlie(at)comcast(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brian Hurt <bhurt(at)janestcapital(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Block-level CRC checks
Date:	2008-10-01 17:32:58
Message-ID:	87wsgsz0dx.fsf@oxford.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Paul Schlie <schlie(at)comcast(dot)net> writes:

> Tom Lane wrote:
>> Paul Schlie writes:
>>> - yes, if you're willing to compute true CRC's as opposed to simpler
>>> checksums, which may be worth the price if in fact many/most data
>>> check failures are truly caused by single bit errors somewhere in the
>>> chain,
>>
>> FWIW, not one of the corrupted-data problems I've investigated has ever
>> looked like a single-bit error. So the theoretical basis for using a
>> CRC here seems pretty weak. I doubt we'd even consider automatic repair
>> attempts anyway.
>
> - although I accept that you may be correct in your assessment that most
> errors are in fact multi-bit;

I've seen bad memory in a SCSI controller cause single-bit errors in storage.
It was quite confusing since the symptom was syntax errors in the C code we
were compiling on the server. The sysadmin actually caught it reliably
corrupting a block of source text written out and read back.

I've also seen single-bit errors caused by bad memory in a network interface.
*Twice*. Particularly nasty since the CRC on TCP/IP packets is only 16-bit so
a large enough ftp transfer would eventually finish despite the packet loss
but with the occasional bits flipped. In these days of SAN/NAS and SCSI over
IP that's pretty scary...

Several cases on list have come down to "filesystem secretly replaces entire
block of data with Folger's Crystals(tm) -- let's see if the database
notices". Any checksum would help in that case but I wouldn't discount single
bit errors either.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

In response to

Re: Block-level CRC checks at 2008-10-01 16:27:35 from Paul Schlie

Responses

Re: Block-level CRC checks at 2008-10-02 07:41:47 from Florian Weimer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2008-10-01 17:47:07	September CommitFest Closed
Previous Message	Gregory Stark	2008-10-01 17:25:52	Re: Block-level CRC checks