Re: Page Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Page Checksums
Date: 2011-12-21 07:01:16
Message-ID: 4EF1843C.6090101@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/19/2011 06:14 PM, Kevin Grittner wrote:
>> But if you need all that infrastructure just to get the feature
>> launched, that's a bit hard to stomach.
>>
>
> Triggering a vacuum or some hypothetical "scrubbing" feature?
>

What you were suggesting doesn't require triggering just a vacuum
though--it requires triggering some number of vacuums, for all impacted
relations. You said yourself that "all tables if the there's no way to
rule any of them out" was a possibility. I'm just pointing out that
scheduling that level of work is a logistics headache, and it would be
reasonable for people to expect some help with that were it to become a
necessary thing falling out of the implementation.

> Some people think I border on the paranoid on this issue.

Those people are also out to get you, just like the hardware.

> Are you arguing that autovacuum should be disabled after crash
> recovery? I guess if you are arguing that a database VACUUM might
> destroy recoverable data when hardware starts to fail, I can't
> argue.

A CRC failure suggests to me a significantly higher possibility of
hardware likely to lead to more corruption than a normal crash does though.

>> The main way I expect to validate this sort of thing is with an as
>> yet unwritten function to grab information about a data block from
>> a standby server for this purpose, something like this:
>>
>> Master: Computed CRC A, Stored CRC B; error raised because A!=B
>> Standby: Computed CRC C, Stored CRC D
>>
>> If C==D&& A==C, the corruption is probably overwritten bits of
>> the CRC B.
>>
>
> Are you arguing we need *that* infrastructure to get the feature
> launched?
>

No; just pointing out the things I'd eventually expect people to want,
because they help answer questions about what to do when CRC failures
occur. The most reasonable answer to "what should I do about suspected
corruption on a page?" in most of the production situations I worry
about is "see if it's recoverable from the standby". I see this as
being similar to how RAID-1 works: if you find garbage on one drive,
and you can get a clean copy of the block from the other one, use that
to recover the missing data. If you don't have that capability, you're
stuck with no clear path forward when a CRC failure happens, as you
noted downthread.

This obviously gets troublesome if you've recently written a page out,
so there's some concern about whether you are checking against the
correct version of the page or not, based on where the standby's replay
is at. I see that as being a case that's also possible to recover from
though, because then the page you're trying to validate on the master is
likely sitting in the recent WAL stream. This is already the sort of
thing companies doing database recovery work (of which we are one) deal
with, and I doubt any proposal will cover every possible situation. In
some cases there may be no better answer than "show all the known
versions and ask the user to sort it out". The method I suggested would
sometimes kick out an automatic fix.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2011-12-21 08:27:15 Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Previous Message Tom Lane 2011-12-21 05:33:30 Re: CLOG contention