Quick Links

Re: Block-level CRC checks

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Block-level CRC checks
Date:	2009-12-01 12:05:47
Message-ID:	200912011205.nB1C5l818454@momjian.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Simon Riggs wrote:
> > I think
> > the problem is that the existing proposal can't distinguish between
> > these two cases so the user has no idea how to respond to the report.
>
> If 99.5% of cases are real corruption then there is little need to
> distinguish between the cases, nor much value in doing so. The
> prevalence of the different error types is critical to understanding how
> to respond.
>
> If a man pulls a gun on you, your first thought isn't "some people
> remove guns from their jacket to polish them, so perhaps he intends to
> polish it now" because the prevalence of shootings is high, when faced
> by people with guns, and the risk of dying is also high. You make a
> judgement based upon the prevalence and the risk.
>
> That is all I am asking for us to do here, make a balanced call. These
> recent comments are a change in my own position, based upon evaluating
> the prevalence and the risk. I ask others to consider the same line of
> thought rather than a black/white assessment.
>
> All useful detection mechanisms have non-zero false positives because we
> would rather sometimes ring the bell for no reason than to let bad
> things through silently, as we do now.

OK, but what happens if someone gets the failure report, assumes their
hardware is faulty and replaces it, and then gets a failure report
again? I assume torn pages are 99% of the reported problem, which are
expected and are fixed, and bad hardware 1%, quite the opposite of your
numbers above.

What might be interesting is to report CRC mismatches if the database
was shut down cleanly previously; I think in those cases we shouldn't
have torn pages.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Re: Block-level CRC checks at 2009-12-01 11:58:03 from Simon Riggs

Responses

Re: Block-level CRC checks at 2009-12-01 12:38:37 from Simon Riggs
Re: Block-level CRC checks at 2009-12-01 13:30:23 from Heikki Linnakangas
Re: Block-level CRC checks at 2009-12-01 17:55:05 from Joshua D. Drake

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2009-12-01 12:06:23	Re: Block-level CRC checks
Previous Message	Robert Haas	2009-12-01 12:00:14	Re: CommitFest status/management