Quick Links

Re: What exactly is our CRC algorithm?

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: What exactly is our CRC algorithm?
Date:	2014-10-27 16:02:59
Message-ID:	544E6CB3.40800@vmware.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/09/2014 12:13 AM, Andres Freund wrote:
> On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
>> As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
>> correspond to any bit-by-bit CRC variant and polynomial. My math skills are
>> not strong enough to determine what the consequences of that are. It might
>> still be a decent checksum. Or not. I couldn't tell if the good error
>> detection properties of the normal CRC-32 polynomial apply to our algorithm
>> or not.
>
> Additional interesting datapoints are that hstore and ltree contain the
> same tables - but properly use the reflected computation.
>
>> Thoughts?
>
> It clearly seems like a bad idea to continue with this - I don't think
> anybody here knows which guarantees this gives us.
>
> The question is how can we move away from this. There's unfortunately
> two places that embed PGC32 that are likely to prove problematic when
> fixing the algorithm: pg_trgm and tsgist both seem to include crc's in
> their logic in a persistent way. I think we should provide
> INIT/COMP/FIN_PG32 using the current algorithm for these.

Agreed, it's not worth breaking pg_upgrade for this.

> If we're switching to a saner computation, we should imo also switch to
> a better polynom - CRC-32C has better error detection capabilities than
> CRC32 and is available in hardware. As we're paying the price pf
> breaking compat anyway...

Agreed.

> Arguably we could also say that given that there's been little evident
> problems with the borked computation we could also switch to a much
> faster hash instead of continuing to use crc...

I don't feel like taking the leap. Once we switch to slice-by-4/8 and/or
use a hardware instruction when available, CRC is fast enough.

I came up with the attached patches. They do three things:

1. Get rid of the 64-bit CRC code. It's not used for anything, and
haven't been for years, so it doesn't seem worth spending any effort to
fix them.

2. Switch to CRC-32C (Castagnoli) for WAL and other places that don't
need to remain compatible across major versions.

3. Use the same lookup table for hstore and ltree, as used for the
legacy "almost CRC-32" algorithm. The tables are identical, so might as
well.

Any objections?

- Heikki

Attachment	Content-Type	Size
0001-Remove-support-for-64-bit-CRC.patch	text/x-diff	20.6 KB
0002-Switch-to-CRC-32C-in-WAL-and-other-places.patch	text/x-diff	43.6 KB

In response to

Re: What exactly is our CRC algorithm? at 2014-10-08 21:13:49 from Andres Freund

Responses

Re: What exactly is our CRC algorithm? at 2014-11-04 09:47:18 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2014-10-27 16:12:26	Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Previous Message	Andrew Dunstan	2014-10-27 16:02:50	Re: TAP test breakage on MacOS X