Skip site navigation (1) Skip section navigation (2)

Re: Block-level CRC checks

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-11-30 18:16:17
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Mon, 2009-11-30 at 13:21 +0000, Simon Riggs wrote:
> On Fri, 2008-10-17 at 12:26 -0300, Alvaro Herrera wrote:
> > So this discussion died with no solution arising to the
> > hint-bit-setting-invalidates-the-CRC problem.
> > 
> > Apparently the only solution in sight is to WAL-log hint bits.  Simon
> > opines it would be horrible from a performance standpoint to WAL-log
> > every hint bit set, and I think we all agree with that.  So we need to
> > find an alternative mechanism to WAL log hint bits.
> It occurred to me that maybe we don't need to WAL-log the CRC checks.
> Proposal
> * We reserve enough space on a disk block for a CRC check. When a dirty
> block is written to disk we calculate and annotate the CRC value, though
> this is *not* WAL logged.
> * In normal running we re-check the CRC when we read the block back into
> shared_buffers.
> * In recovery we will overwrite the last image of a block from WAL, so
> we ignore the block CRC check, since the WAL record was already CRC
> checked. If full_page_writes = off, we ignore and zero the block's CRC
> for any block touched during recovery. We do those things because the
> block CRC in the WAL is likely to be different to that on disk, due to
> hints.
> * We also re-check the CRC on a block immediately before we dirty the
> block (for any reason). This minimises the possibility of in-memory data
> corruption for blocks.
> So in the typical case all blocks moving from disk <-> memory and from
> clean -> dirty are CRC checked. So in the case where we have
> full_page_writes = on then we have a good CRC every time. In the
> full_page_writes = off case we are exposed only on the blocks that
> changed during last checkpoint cycle and only if we crash. That seems
> good because most databases are up 99% of the time, so any corruptions
> are likely to occur in normal running, not as a result of crashes.
> This would be a run-time option.
> Like it?

Just FYI, Alvaro is out of town and our of email access (almost
exclusively). It may take him another week or so to get back to this.

Joshua D. Drake

> -- 
>  Simon Riggs 

-- Major Contributor
Command Prompt, Inc: - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

In response to

pgsql-hackers by date

Next:From: Kevin GrittnerDate: 2009-11-30 18:20:12
Subject: Re: Deleted WAL files held open by backends in Linux
Previous:From: Tom LaneDate: 2009-11-30 18:15:06
Subject: A thought about regex versus multibyte character sets

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group