Skip site navigation (1) Skip section navigation (2)

Re: Block-level CRC checks

From: Brian Hurt <bhurt(at)janestcapital(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql(at)mohawksoft(dot)com, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, "Decibel!" <decibel(at)decibel(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2008-10-02 13:07:40
Message-ID: 48E4C79C.8090104@janestcapital.com (view raw or flat)
Thread:
Lists: pgsql-hackers
I have a stupid question wrt hint bits and CRC checksums- it seems to me 
that it should be possible, if you change the hint bits, to be able to 
very easily calculate what the change in the CRC checksum should be.

The basic idea of the CRC checksum is that, given a message x, the 
checksum is x mod p where p is some constant polynomial (all operations 
are in GF(2^n)).  Now, the interesting thing about modulo is that it's 
distributable- that is to say, (x ^ y) mod p = (x mod p) ^ (y mod p), 
and that
(x * y) mod p = ((x mod p) * (y mod p)) mod p (I'm using ^ instead of 
the more traditional + here to emphasize that it's xor, not addition, 
I'm doing).  So let's assume we're updating a word a known n bytes from 
the end of the message- we calculate y = old_value ^ new_value, so our 
change is the equivalent of changing the original block m to (m ^ (y * 
x^{8n})).  The new checksum is then (m ^ (y * x^{8n})) mod p =
(m mod p) ^ (((y mod p) * (x^{8n} mod p)) mod p).  Now, m mod p is the 
original checksum, and (x^{8n} mod p) is a constant for a given n, and 
the multiplication modulo p can be implemented as a set of table 
lookups, one per byte.

The take away here is that, if we know ahead of time where the 
modifications are going to be, we can make updating the CRC checksum 
(relatively) cheap.  So, for example, a change of the hint bits would 
only need 4 tables lookups and a couple of xors to update the block's 
CRC checksum.  We could extended this idea- break the 8K page up into, 
say, 32 256-byte "subblocks".  Changing any given subblock would require 
only re-checksumming that subblock and then updating the CRC checksum.  
The reason for the subblocks would be to limit the number of tables 
necessary- each subblock requires it's own set of 4 256-word tables, so 
having 32 subblocks means that the tables involved would be 32*4*256*4 = 
128K in size.  Going to, say, 64 byte subblocks means needing 128 tables 
or 512K of tables.

If people are interested, I could bang out the code tonight, and post it 
to the list tomorrow.

Brian


In response to

Responses

pgsql-hackers by date

Next:From: Jonah H. HarrisDate: 2008-10-02 13:12:16
Subject: Re: Block-level CRC checks
Previous:From: Tom LaneDate: 2008-10-02 12:37:59
Subject: Re: FSM rewrite committed, loose ends

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group