Re: Block-level CRC checks

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 01:02:10
Message-ID: 15106.1259629330@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, 2009-11-30 at 16:49 -0500, Aidan Van Dyk wrote:
>> No, I believe the torn-page problem is exactly the thing that made the
>> checksum talks stall out last time... The torn page isn't currently a
>> problem on only-hint-bit-dirty writes, because if you get
>> half-old/half-new, the only changes is the hint bit - no big loss, the
>> data is still the same.

> A good argument, but we're missing some proportion.

No, I think you are. The problem with the described behavior is exactly
that it converts a non-problem into a problem --- a big problem, in
fact: uncorrectable data loss. Loss of hint bits is expected and
tolerated in the current system design. But a block with bad CRC is not
going to have any automated recovery path.

So the difficulty is that in the name of improving system reliability
by detecting infrequent corruption events, we'd be decreasing system
reliability by *creating* infrequent corruption events, added onto
whatever events we were hoping to detect. There is no strong argument
you can make that this isn't a net loss --- you'd need to pull some
error-rate numbers out of the air to even try to make the argument,
and in any case the fact remains that more data gets lost with the CRC
than without it. The only thing the CRC is really buying is giving
the PG project a more plausible argument for blaming data loss on
somebody else; it's not helping the user whose data got lost.

It's hard to justify the amount of work and performance hit we'd take
to obtain a "feature" like that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-12-01 01:11:35 Re: ProcessUtility_hook
Previous Message Andres Freund 2009-12-01 00:26:32 Re: Application name patch - v4