Re: Block-level CRC checks

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 18:02:07
Message-ID: 1259690527.26322.30.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2009-12-01 at 10:55 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote:
> >> It's not hard to imagine that when a hardware glitch happens
> >> causing corruption, it also causes the system to crash. Recalculating
> >> the CRCs after crash would mask the corruption.
>
> > They are already masked from us, so continuing to mask those errors
> > would not put us in a worse position.
>
> No, it would just destroy a large part of the argument for why this
> is worth doing. "We detect disk errors ... except for ones that happen
> during a database crash." "Say what?"
>
> The fundamental problem with this is the same as it's been all along:
> the tradeoff between implementation work expended, performance overhead
> added, and net number of real problems detected (with a suitably large
> demerit for actually *introducing* problems) just doesn't look
> attractive. You can make various compromises that improve one or two of
> these factors at the cost of making the others worse, but at the end of
> the day I've still not seen a combination that seems worth doing.

Let me try a different but similar perspective. The problem we are
trying to solve here, only matters to a very small subset of the people
actually using PostgreSQL. Specifically, a percentage that is using
PostgreSQL in a situation where they can lose many thousands of dollars
per minute or hour should an outage occur.

On the other hand it is those very people that are *paying* people to
try and implement these features. Kind of a catch-22.

The hard core reality is this. *IF* it is one of the goals of this
project to insure that the software can be safely, effectively, and
responsibly operated in a manner that is acceptable to C* level people
in a Fortune level company then we *must* solve this problem.

If it is not the goal of the project, leave it to EDB/CMD/2ndQuandrant
to fork it because it will eventually happen. Our customers are
demanding these features.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-12-01 18:05:47 Re: Block-level CRC checks
Previous Message Bruce Momjian 2009-12-01 17:57:50 Re: Empty dictionary file when creating text search dictionary