Re: Block-level CRC checks

From: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
To: Decibel! <decibel(at)decibel(dot)org>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2008-09-30 22:49:17
Message-ID: DC9A3FF3-F056-449A-926F-CD1F57740042@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30 Sep 2008, at 10:17 PM, Decibel! <decibel(at)decibel(dot)org> wrote:

> On Sep 30, 2008, at 1:48 PM, Heikki Linnakangas wrote:
>> This has been suggested before, and the usual objection is
>> precisely that it only protects from errors in the storage layer,
>> giving a false sense of security.
>
> If you can come up with a mechanism for detecting non-storage errors
> as well, I'm all ears. :)
>
> In the meantime, you're way, way more likely to experience
> corruption at the storage layer than anywhere else.

Fwiw this hasn't been my experience. Bad memory is extremely common
and even the storage failures I've seen (excluding the drive crashes)
turned out to actually be caused by bad memory.

That said I've always been interested in doing this. The main use case
in my mind has actually been for data that's been restored from old
backups which have been lying round and floating between machines for
a while with many opportunities for bit errors to show up.

The main stumbling block I ran into was how to deal with turning the
option off and on. I wanted it to be possible to turn off the option
to have the database ignore any errors and to avoid the overhead.

But that means including an escape hatch value which is always
considered to be correct. But that dramatically reduces the
effectiveness of the scheme.

Another issue is it will make space available on each page smaller
making it harder to do in place upgrades.

If you can deal with those issues and carefully deal with the
contingencies so it's clear to people what to do when errra occur or
they want to turn the feature on or off then I'm all for it. That
despite my experience of memory errors being a lot more common than
undetected storage errors.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-09-30 22:52:15 WAL recovery is broken by FSM patch
Previous Message Decibel! 2008-09-30 21:37:48 Bad error message