Re: Block-level CRC checks

From: Paul Schlie <schlie(at)comcast(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2008-10-01 04:43:57
Message-ID: C508784D.144F4%schlie@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

If you are concerned with data integrity (not caused by bugs in the code
itself), you may be interested in utilizing ZFS; however, be aware that I
found and reported a bug in their implementation of the Fletcher checksum
algorithm they use by default to attempt to verify the integrity of the data
stored in their file system, and further aware that checksums/CRCs do not
enable the correction of errors in general, therefore be prepared to make
the decision of "what should be done in the event of a failure"; ZFS
effectively locks up in certain circumstances rather risk silently using
suspect data with some form of persistent indication that the result may be
corrupted. (strong CRC's and FEC's are relatively inexpensive to compute).

So in summary, my two cents: a properly implemented 32/64 bit Fletcher
checksum is likely adequate to detect most errors (and correct them if
presumed to be a result of a single flipped bit within 128KB or so, as such
a Fletcher checksum has a hamming distance of 3 within blocks of this size,
albeit fairly expensive to do so by trial and error; further presuming that
this can not be relied upon, a strategy potentially utilizing the suspect
data as if it were good likely needs to be adopted, accompanied somehow with
a persistent indication that the query results (or specific sub-results) are
themselves suspect, as it may often be a lesser evil than the alternative
(but not always). Or use a file system like ZFS, and let it do its thing,
and hope for the best.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2008-10-01 05:12:28 Re: Bad error message
Previous Message Tom Lane 2008-10-01 04:09:28 Re: Bad error message