Re: Block-level CRC checks

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-11-30 23:28:21
Message-ID: 1259623701.13774.10218.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2009-11-30 at 16:49 -0500, Aidan Van Dyk wrote:
> * Simon Riggs <simon(at)2ndQuadrant(dot)com> [091130 16:28]:
> >
> > You've written that as if you are spotting a problem. It sounds to me
> > that this is exactly the situation we would like to detect and this is a
> > perfect way of doing that.
> >
> > What do you see is the purpose here apart from spotting corruptions?
> >
> > Do we think error rates are so low we can recover the corruption by
> > doing something clever with the CRC? I envisage most corruptions as
> > being unrecoverable except from backup/WAL/replicated servers.
> >
> > It's been a long day, so perhaps I've misunderstood.
>
> No, I believe the torn-page problem is exactly the thing that made the
> checksum talks stall out last time... The torn page isn't currently a
> problem on only-hint-bit-dirty writes, because if you get
> half-old/half-new, the only changes is the hint bit - no big loss, the
> data is still the same.
>
> But, with a form of check-sums, when you read it it next time, is it
> corrupt? According to the check-sum, yes, but in reality, the *data* is
> still valid, just that the check sum is/isn't correctly matching the
> half-changed hint bits...

A good argument, but we're missing some proportion.

There are at most 240 hint bits in an 8192 byte block. So that is less
than 0.5% of the data block where a single bit error would not corrupt
data, and 0% of the data block where a 2+ bit error would not corrupt
data. Put it another way, more than 99.5% of possible errors would cause
data loss, so I would at least like the option of being told about them.

The other perspective is that these errors are unlikely to be caused by
cosmic rays and other quantum effects, they are more likely to be caused
by hardware errors. Hardware errors are frequently repeatable, so one
bank of memory or one section of DRAM is damaged and will give errors.
If we don't report an error, the next error from that piece of hardware
is almost certain to cause data loss, so even a false positive result
should be treated as a good indicator of a true positive detection
result in the future.

If protection against data loss really does need to be so invasive that
we need to WAL-log all changes, then lets make it a table-level option.
If people want to pay the price, we should at least give them the option
of doing so. We can think of ways of optimising it later. Since I was
the one who opposed this on the basis of performance, I want to rescind
that objection and say lets make it an option for those that wish to
trade performance for some visibility of possible data loss errors.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-11-30 23:59:09 Re: Application name patch - v4
Previous Message 张中 2009-11-30 23:10:25 答复: [HACKERS] is isolation level 'Serializable' in pg not same as 'serializable' in SQL-92?