Re: Block-level CRC checks

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 19:41:57
Message-ID: 407d949e0912011141m6404bb33s938daecd50eb7e32@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 1, 2009 at 7:19 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> However, that solution would not detect subtle corruption, like
> single-bit-flipping issues caused by quantum errors.

Well there is a solution for this, ECC RAM. There's *no* software
solution for it. The corruption can just as easily happen the moment
you write the value before you calculate any checksum or in the
register holding the value before you even write it. Or it could occur
the moment after you finish checking the checksum. Also you're not
going to be able to be sure you're checking the actual dram and not
the L2 cache or the processor's L1/L0 caches.

ECC RAM solves this problem properly and it does work. There's not
much point in paying a much bigger cost for an ineffective solution.

> Also, it would
> require reading back each page as it's written to disk, which is OK for
> a bunch of single-row writes, but for bulk data loads a significant problem.

Not sure what that really means for Postgres. It would just mean
reading back the same page of memory from the filesystem cache that we
just read.

It sounds like you're describing fsyncing every single page to disk
and then wait 1min/7200 or even 1min/15k to do a direct read for every
single page -- that's not a 20% performance hit though. We would have
to change our mascot from the elephant to a snail I think.

You could imagine a more complex solution where you have a separate
process wait until the next checkpoint then do direct reads for all
the blocks written since the previous checkpoint (which have now been
fsynced) and verify that the block on disk has a verifiable CRC. I'm
not sure even direct reads let you get the block on disk if someone
else has written the block into cache though. If you could then this
sounds like it could be made to work efficiently (with sequential
bitmap-style scans) and could be quite handy. What I like about that
is you could deprioritize this process's i/o so that it didn't impact
the main processing. As things stand this wouldn't detect pages
written because they were dirtied by hint bit updates but that could
be addressed a few different ways.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-12-01 19:46:27 Re: SE-PgSQL patch review
Previous Message Kevin Grittner 2009-12-01 19:39:43 Re: Block-level CRC checks