Re: Page Checksums

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Greg Smith" <greg(at)2ndquadrant(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Subject: Re: Page Checksums
Date: 2011-12-21 15:29:45
Message-ID: 201112211629.45491.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, December 21, 2011 04:21:53 PM Kevin Grittner wrote:
> Greg Smith <greg(at)2ndQuadrant(dot)com> wrote:
> >> Some people think I border on the paranoid on this issue.
> >
> > Those people are also out to get you, just like the hardware.
>
> Hah! I *knew* it!
>
> >> Are you arguing that autovacuum should be disabled after crash
> >> recovery? I guess if you are arguing that a database VACUUM
> >> might destroy recoverable data when hardware starts to fail, I
> >> can't argue.
> >
> > A CRC failure suggests to me a significantly higher possibility
> > of hardware likely to lead to more corruption than a normal crash
> > does though.
>
> Yeah, the discussion has me coming around to the point of view
> advocated by Andres: that it should be treated the same as corrupt
> pages detected through other means. But that can only be done if
> you eliminate false positives from hint-bit-only updates. Without
> some way to handle that, I guess that means the idea is dead.
>
> Also, I'm not sure that our shop would want to dedicate any space
> per page for this, since we're comparing between databases to ensure
> that values actually match, row by row, during idle time. A CRC or
> checksum is a lot weaker than that. I can see where it would be
> very valuable where more rigorous methods aren't in use; but it
> would really be just extra overhead with little or no benefit for
> most of our database clusters.
Comparing between database will by far not recognize failures in all data
because you surely will not use all indexes. With index only scans the
likelihood of unnoticed heap corruption also increases.
E.g. I have seen disk level corruption silently corrupting a unique index so
it didn't cover all data anymore which lead to rather big problems.
Not everyone can do regular dump+restore tests to protect against such
scenarios...

Andres

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Leonardo Francalanci 2011-12-21 15:32:53 Re: Page Checksums
Previous Message Robert Haas 2011-12-21 15:28:40 Re: CLOG contention