Re: Page Checksums

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Page Checksums
Date: 2011-12-19 19:27:08
Message-ID: CA+TgmoZhSKAP-TN6N2ahe-+zfZn_L-T_ykVOekyuCU_Z2Kh+=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 19, 2011 at 12:07 PM, David Fetter <david(at)fetter(dot)org> wrote:
> On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote:
>> On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> > * Aidan Van Dyk (aidan(at)highrise(dot)ca) wrote:
>> >> But the scary part is you don't know how long *ago* the crash was.
>> >> Because a hint-bit-only change w/ a torn-page is a "non event" in
>> >> PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try
>> >> and "scrub" every page in the database.
>> >
>> > Fair enough, but, could we distinguish these two cases?  In other words,
>> > would it be possible to detect if a page was torn due to a 'traditional'
>> > crash and not complain in that case, but complain if there's a CRC
>> > failure and it *doesn't* look like a torn page?
>>
>> No.
>
> Would you be so kind as to elucidate this a bit?

Well, basically, Stephen's proposal was pure hand-waving. :-)

I don't know of any magic trick that would allow us to know whether a
CRC failure "looks like a torn page". The only information we're
going to get is the knowledge of whether the CRC matches or not. If
it doesn't, it's fundamentally impossible for us to know why. We know
the page contents are not as expected - that's it!

It's been proposed before that we could examine the page, consider all
the unset hint bits that could be set, and try all combinations of
setting and clearing them to see whether any of them produce a valid
CRC. But, as Tom has pointed out previously, that has a really quite
large chance of making a page that's *actually* been corrupted look
OK. If you have 30 or so unset hint bits, odds are very good that
some combination will produce the 32-CRC you're expecting.

To put this another way, we currently WAL-log just about everything.
We get away with NOT WAL-logging some things when we don't care about
whether they make it to disk. Hint bits, killed index tuple pointers,
etc. cause no harm if they don't get written out, even if some other
portion of the same page does get written out. But as soon as you CRC
the whole page, now absolutely every single bit on that page becomes
critical data which CANNOT be lost. IOW, it now requires the same
sort of protection that we already need for our other critical updates
- i.e. WAL logging. Or you could introduce some completely new
mechanism that serves the same purpose, like MySQL's double-write
buffer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-12-19 19:29:24 Re: Page Checksums
Previous Message Greg Smith 2011-12-19 19:18:11 Re: why do we need two snapshots per query?