Skip site navigation (1) Skip section navigation (2)

Re: Block-level CRC checks

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-11-30 18:16:17
Message-ID: 1259604977.26322.5.camel@jd-desktop.unknown.charter.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Mon, 2009-11-30 at 13:21 +0000, Simon Riggs wrote:
> On Fri, 2008-10-17 at 12:26 -0300, Alvaro Herrera wrote:
> > So this discussion died with no solution arising to the
> > hint-bit-setting-invalidates-the-CRC problem.
> > 
> > Apparently the only solution in sight is to WAL-log hint bits.  Simon
> > opines it would be horrible from a performance standpoint to WAL-log
> > every hint bit set, and I think we all agree with that.  So we need to
> > find an alternative mechanism to WAL log hint bits.
> 
> It occurred to me that maybe we don't need to WAL-log the CRC checks.
> 
> Proposal
> 
> * We reserve enough space on a disk block for a CRC check. When a dirty
> block is written to disk we calculate and annotate the CRC value, though
> this is *not* WAL logged.
> 
> * In normal running we re-check the CRC when we read the block back into
> shared_buffers.
> 
> * In recovery we will overwrite the last image of a block from WAL, so
> we ignore the block CRC check, since the WAL record was already CRC
> checked. If full_page_writes = off, we ignore and zero the block's CRC
> for any block touched during recovery. We do those things because the
> block CRC in the WAL is likely to be different to that on disk, due to
> hints.
> 
> * We also re-check the CRC on a block immediately before we dirty the
> block (for any reason). This minimises the possibility of in-memory data
> corruption for blocks.
> 
> So in the typical case all blocks moving from disk <-> memory and from
> clean -> dirty are CRC checked. So in the case where we have
> full_page_writes = on then we have a good CRC every time. In the
> full_page_writes = off case we are exposed only on the blocks that
> changed during last checkpoint cycle and only if we crash. That seems
> good because most databases are up 99% of the time, so any corruptions
> are likely to occur in normal running, not as a result of crashes.
> 
> This would be a run-time option.
> 
> Like it?
> 

Just FYI, Alvaro is out of town and our of email access (almost
exclusively). It may take him another week or so to get back to this.

Joshua D. Drake



> -- 
>  Simon Riggs           www.2ndQuadrant.com
> 
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander


In response to

pgsql-hackers by date

Next:From: Kevin GrittnerDate: 2009-11-30 18:20:12
Subject: Re: Deleted WAL files held open by backends in Linux
Previous:From: Tom LaneDate: 2009-11-30 18:15:06
Subject: A thought about regex versus multibyte character sets

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group