Skip site navigation (1) Skip section navigation (2)

Re: Block-level CRC checks

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql(at)mohawksoft(dot)com,Hannu Krosing <hannu(at)2ndquadrant(dot)com>,Decibel! <decibel(at)decibel(dot)org>,Alvaro Herrera <alvherre(at)commandprompt(dot)com>,Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2008-10-01 16:22:40
Message-ID: 20081001162240.GI16893@yugib.highrise.ca (view raw or flat)
Thread:
Lists: pgsql-hackers
* Gregory Stark <stark(at)enterprisedb(dot)com> [081001 11:59]:
 
> If setting a hint bit cleared a flag on the buffer header then the
> checksumming process could set that flag, begin checksumming, and check that
> the flag is still set when he's finished.
> 
> Actually I suppose that wouldn't actually be good enough. He would have to do
> the i/o and check that the checksum was still valid after the i/o. If not then
> he would have to recalculate the checksum and repeat the i/o. That might make
> the idea a loser since I think the only way it wins is if you rarely actually
> get someone setting the hint bits during i/o anyways.

A doubled-write is essentially "free" with PostgreSQL because it's not
doing direct IO, rather relying on the OS page cache to be efficient.
So if you write block A and then call write on block A immediately (or,
realistically, after a redo of the checksum), the first write is almost
*never* going to take IO bandwidth to your spindles...

But the problem is if something crashes (or interrupts PG) between those
two writes, you've got a block of data into the pagecache (and possibly
to the disks) that PG will no longer read in, because the CRC/checksum
fails despite the actual content being valid...

So if you're going to be makeing PG refuse to read-in blocks with bad
CRC/csum, you need to guarnetee that nothing fiddles with the block
between the start of the CRC and the completion of the write().

One possibility would be to "double-buffer" the write... i.e. as you
calculate your CRC, you're doing it on a local copy of the block, which
you hand to the OS to write...  If you're touching the whole block of
memory to CRC it, it isn't *ridiculously* more expensive to copy the
memory somewhere else as you do it...

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan(at)highrise(dot)ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

In response to

Responses

pgsql-hackers by date

Next:From: Paul SchlieDate: 2008-10-01 16:27:35
Subject: Re: Block-level CRC checks
Previous:From: Andreas KretschmerDate: 2008-10-01 16:16:03
Subject: Re: Transactions within a function body

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group