Re: Block-level CRC checks

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2008-11-16 11:08:57
Message-ID: 20081116110857.GB25476@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 14, 2008 at 10:51:57AM -0500, Tom Lane wrote:
> In fact, if the patch were to break torn-page handling, it would be
> 100% likely to be a net *decrease* in system reliability. It would add
> detection of a situation that is not supposed to happen (ie, storage
> system fails to return the same data it stored) at the cost of breaking
> one's database when the storage system acts as it's expected and
> documented to in a routine power-loss situation.

Ok, I see it's a problem because the hint changes are not WAL logged,
so torn pages are expected to work in normal operation. But simply
skipping the hint bits during checksumming is a terrible solution,
since then any errors in those bits will go undetected. To not be able
to say in the documentation that you'll detect 100% of single-bit
errors is pretty darn terrible, since that's kind of the goal of the
exercise.

Unfortunatly, there's not a lot of easy solutions here. You could do
two checksums, one with and one without hint bits. The overall checksum
tells you if there's a problem. If it doesn't match the second checksum
will tell you if it's the hint bits or not (torn page problem). If it's
the hint bits you can reset them all and continue. The checksums need
not be of equal strength.

The extreme case is an ECC where you explicitly can set it so you can
alter N bits before you need to recalculate the checksum.
Computationally though, that sucks.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Saito 2008-11-16 13:36:50 Re: [PATCHES] Solve a problem of LC_TIME of windows.
Previous Message Michael Meskes 2008-11-16 10:05:10 Re: pgsql: Enable script to generate preproc.y in build process.