Re: Block-level CRC checks

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-11-30 18:16:17
Message-ID: 1259604977.26322.5.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2009-11-30 at 13:21 +0000, Simon Riggs wrote:
> On Fri, 2008-10-17 at 12:26 -0300, Alvaro Herrera wrote:
> > So this discussion died with no solution arising to the
> > hint-bit-setting-invalidates-the-CRC problem.
> >
> > Apparently the only solution in sight is to WAL-log hint bits. Simon
> > opines it would be horrible from a performance standpoint to WAL-log
> > every hint bit set, and I think we all agree with that. So we need to
> > find an alternative mechanism to WAL log hint bits.
>
> It occurred to me that maybe we don't need to WAL-log the CRC checks.
>
> Proposal
>
> * We reserve enough space on a disk block for a CRC check. When a dirty
> block is written to disk we calculate and annotate the CRC value, though
> this is *not* WAL logged.
>
> * In normal running we re-check the CRC when we read the block back into
> shared_buffers.
>
> * In recovery we will overwrite the last image of a block from WAL, so
> we ignore the block CRC check, since the WAL record was already CRC
> checked. If full_page_writes = off, we ignore and zero the block's CRC
> for any block touched during recovery. We do those things because the
> block CRC in the WAL is likely to be different to that on disk, due to
> hints.
>
> * We also re-check the CRC on a block immediately before we dirty the
> block (for any reason). This minimises the possibility of in-memory data
> corruption for blocks.
>
> So in the typical case all blocks moving from disk <-> memory and from
> clean -> dirty are CRC checked. So in the case where we have
> full_page_writes = on then we have a good CRC every time. In the
> full_page_writes = off case we are exposed only on the blocks that
> changed during last checkpoint cycle and only if we crash. That seems
> good because most databases are up 99% of the time, so any corruptions
> are likely to occur in normal running, not as a result of crashes.
>
> This would be a run-time option.
>
> Like it?
>

Just FYI, Alvaro is out of town and our of email access (almost
exclusively). It may take him another week or so to get back to this.

Joshua D. Drake

> --
> Simon Riggs www.2ndQuadrant.com
>
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-11-30 18:20:12 Re: Deleted WAL files held open by backends in Linux
Previous Message Tom Lane 2009-11-30 18:15:06 A thought about regex versus multibyte character sets