Skip site navigation (1) Skip section navigation (2)

Re: Block-level CRC checks

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 07:52:46
Message-ID: 1259653966.13774.11898.camel@ebony (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Mon, 2009-11-30 at 20:02 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Mon, 2009-11-30 at 16:49 -0500, Aidan Van Dyk wrote:
> >> No, I believe the torn-page problem is exactly the thing that made the
> >> checksum talks stall out last time...  The torn page isn't currently a
> >> problem on only-hint-bit-dirty writes, because if you get
> >> half-old/half-new, the only changes is the hint bit - no big loss, the
> >> data is still the same.
> > A good argument, but we're missing some proportion.
> No, I think you are.  The problem with the described behavior is exactly
> that it converts a non-problem into a problem --- a big problem, in
> fact: uncorrectable data loss.  Loss of hint bits is expected and
> tolerated in the current system design.  But a block with bad CRC is not
> going to have any automated recovery path.
> So the difficulty is that in the name of improving system reliability
> by detecting infrequent corruption events, we'd be decreasing system
> reliability by *creating* infrequent corruption events, added onto
> whatever events we were hoping to detect.  There is no strong argument
> you can make that this isn't a net loss --- you'd need to pull some
> error-rate numbers out of the air to even try to make the argument,
> and in any case the fact remains that more data gets lost with the CRC
> than without it.  The only thing the CRC is really buying is giving
> the PG project a more plausible argument for blaming data loss on
> somebody else; it's not helping the user whose data got lost.
> It's hard to justify the amount of work and performance hit we'd take
> to obtain a "feature" like that.

I think there is a clear justification for an additional option.

There is no "creation" of corruption events. This scheme detects
corruption events that *have* occurred. Now I understand that we
previously would have recovered seamlessly from such events, but they
were corruption events nonetheless and I think they need to be reported.
(For why, see Conclusion #2, below).

The frequency of such events against other corruption events is
important here. You are right that there is effectively one new *type*
of corruption event but without error-rate numbers you can't say that
this shows substantially "more data gets lost with the CRC than without

So let me say this again: the argument that inaction is a safe response
here relies upon error-rate numbers going in your favour. You don't
persuade us of one argument purely by observing that the alternate
proposition requires a certain threshold error-rate - both propositions
do. So its a straight: "what is the error-rate?" discussion and ISTM
that there is good evidence of what that is.


So, what is the probability of single-bit errors effecting hint bits?
The hint bits can occupy any portion of the block, so their positions
are random. They occupy less than 0.5% of the block, so they must
account for a very small proportion of hardware-induced errors.

Since most reasonable servers use Error Correcting Memory, I would
expect not to see a high level of single bit errors, even though we know
they are occurring in the underlying hardware (Conclusion #1, Schroeder
et al, 2009)

What is the chance that a correctable corruption event is in no way
linked to another non-correctable event later? We would need to argue
that corruptions are a purely stochastic process in all cases, yet
again, there is evidence of both a clear and strong linkage from
correctable to non-correctable errors.  (Conclusion #2 and Conclusion
#7, Schroeder et al, 2009).

Schroeder et al
(thanks Greg!)

Based on that paper, ISTM that ignorable hint bit corruptions are likely
to account for a very small proportion of all corruptions, and of those,
"70-80%" would show up as a non-ignorable corruptions within a month
anyway. So the immediate effect on reliability is tiny, if any. The
effect on detection is huge, which eventually produces significantly
higher relability overall.

> The only thing the CRC is really buying is giving
> the PG project a more plausible argument for blaming data loss on
> somebody else; it's not helping the user whose data got lost.

This isn't about blame, its about detection. If we know something has
happened we can do something about it. Experienced people know that
hardware goes wrong, they just want to be told so they can fix it. 

I blocked development of a particular proposal earlier for performance
reasons, but did not intend to block progress completely. It seems
likely the checks will cause a performance hit. So make them an option.

 Simon Riggs 

In response to


pgsql-hackers by date

Next:From: Heikki LinnakangasDate: 2009-12-01 08:04:07
Subject: Re: Block-level CRC checks
Previous:From: Peter EisentrautDate: 2009-12-01 06:41:59
Subject: Re: [PATCH] Add solaris path for docbook COLLATEINDEX

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group