Re: Block-level CRC checks

From: Paul Schlie <schlie(at)comcast(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2008-10-01 06:57:47
Message-ID: C50897AB.144F7%schlie@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Joshua D. Drake wrote:
> ...
> ZFS is not an option; generally speaking.

Then in general, if the corruption occurred within the:

- read path, try again and hope it takes care of itself.

- write path, the best that can be hoped for is a single bit error
within the data itself which can be both detected and corrected
with a sufficiently strong check sum; or worst case if address or
control information was corrupted, god knows what happed to the
data, and what other data may have been destroyed by having the
data written to the wrong blocks and typically unrecoverable.

- drive itself, this is most typically very unlikely, as strong FEC
codes typically prevent the misidentification of unrecoverable
data as being otherwise.

The simplest thing to do would seem to be to upon reading blocks
check the check sum, if bad, try read again; if that doesn't fix
the problem, assume a single bit error, and iteratively flip
single bits until the check sum matches (hopefully not making the
problem worse as may be the case if many bits were actually already
in error) and write the data back, and proceed as normal, possibly
logging the action; otherwise presume the data is unrecoverable and
in error, somehow mark it as being so such that subsequent queries
which may utilize any portion of it knows it may be corrupt (which
I suspect may be best done not on file-system blocks, but actually
on a logical rows or even individual entries if very large, as my
best initial guess, and likely to measurably affect performance
when enabled, and haven't a clue how resulting query should/could
be identified as being potentially corrupt without confusing the
client which requested it).

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2008-10-01 07:01:08 Re: Block-level CRC checks
Previous Message KaiGai Kohei 2008-10-01 06:48:24 Updates of SE-PostgreSQL 8.4devel patches (r1076)