Re: Enabling Checksums

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-03-17 20:41:40
Message-ID: CA+U5nMLwTzbL=rCF5UMatjA9529StyOosL+c3KcSAde6bW_GRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 March 2013 00:41, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On 15 March 2013 13:08, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> I commented on this before, I personally think this property makes fletcher a
>>> not so good fit for this. Its not uncommon for parts of a block being all-zero
>>> and many disk corruptions actually change whole runs of bytes.
>
>> I think you're right to pick up on this point, and Ants has done a
>> great job of explaining the issue more clearly.
>
>> My perspective, after some thought, is that this doesn't matter to the
>> overall effectiveness of this feature.
>
>> PG blocks do have large runs of 0x00 in them, though that is in the
>> hole in the centre of the block. If we don't detect problems there,
>> its not such a big deal. Most other data we store doesn't consist of
>> large runs of 0x00 or 0xFF as data. Most data is more complex than
>> that, so any runs of 0s or 1s written to the block will be detected.
>
> Meh. I don't think that argument holds a lot of water. The point of
> having checksums is not so much to notice corruption as to be able to
> point the finger at flaky hardware. If we have an 8K page with only
> 1K of data in it, and we fail to notice that the hardware dropped a lot
> of bits in the other 7K, we're not doing our job; and that's not really
> something to write off, because it would be a lot better if we complain
> *before* the hardware manages to corrupt something valuable.
>
> So I think we'd be best off to pick an algorithm whose failure modes
> don't line up so nicely with probable hardware failure modes. It's
> worth noting that one of the reasons that CRCs are so popular is
> precisely that they were designed to detect burst errors with high
> probability.

I think that's a reasonable refutation of my argument, so I will
relent, especially since nobody's +1'd me.

>> What I think we could do here is to allow people to set their checksum
>> algorithm with a plugin.
>
> Please, no. What happens when their plugin goes missing? Or they
> install the wrong one on their multi-terabyte database? This feature is
> already on the hairy edge of being impossible to manage; we do *not*
> need to add still more complication.

Agreed. (And thanks for saying please!)

So I'm now moving towards commit using a CRC algorithm. I'll put in a
feature to allow algorithm be selected at initdb time, though that is
mainly a convenience to allow us to more easily do further testing on
speedups and whether there are any platform specific regressions
there.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-03-17 20:45:31 Re: Enabling Checksums
Previous Message Boszormenyi Zoltan 2013-03-17 19:49:13 Re: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]