Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Date: 2013-04-30 22:39:09
Message-ID: 5180480D.7090507@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/30/13 5:26 PM, Martijn van Oosterhout wrote:
> I came across this today: Data Integrity Extensions, basically a
> standard for have an application calculate a checksum of a block and
> submitting it together with the block so that the disk can verify that
> the block it is writing matches what the application sent.
>
> It appears SCSI has standardised on a CRC-16 checksum with polynomial
> 0x18bb7 .

To be pedantic for a minute (for the first time *ever* on pgsql-hackers)
it's not quite all of SCSI. iSCSI has joined btrfs by settling on
CRC-32C with the Castagnoli polynomial, as mentioned in that first
reference. CRC-32C is also the one with the SSE4.2 instructions to help
too. All the work around the T10/Data Integrity Field standard that's
going on is nice. I think it's going to leave a lot of PostgreSQL users
behind though. I'd bet a large sum of money that five years from now,
there will still be more than 10X as many PostgreSQL servers on EC2 as
on T10/DIF capable hardware.

I feel pretty good that this new FNV-1a implementation is a good
trade-off spot that balances error detection and performance impact. If
you want a 16 bit checksum that seems ready for beta today, we can't do
much better. Fletcher-16 had too many detection holes, the WAL checksum
was way too expensive. Optimized FNV-1a is even better than unoptimized
Fletcher-16 without as many detection issues. Can't even complain about
the code bloat for this part either--checksum.c is only 68 lines if you
take out its documentation.

The WAL logging of hint bits is where the scary stuff to me for this
feature has always been at. My gut feel is that doing that needed to
start being available as an option anyway. Just this month we've had
two customer issues pop up where we had to look for block differences
between a master and a standby. The security update forced some normal
update stragglers to where they now have the 9.1.6 index corruption fix,
and we're looking for cases where standby indexes might have been
corrupted by it. In this case the comparisons can just avoid anything
but indexes, so hint bits are thankfully not involved.

But having false positives pop out of comparing a master and standby due
to hint bits makes this sort of process much harder in general. Being
able to turn checksums on, and then compare more things between master
and standby without expecting any block differences, that will make both
routine quality auditing and forensics of broken clusters so much easier.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-04-30 23:12:04 Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Previous Message Noah Misch 2013-04-30 22:31:46 Re: The missing pg_get_*def functions