Re: What exactly is our CRC algorithm?

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: What exactly is our CRC algorithm?
Date: 2014-10-08 22:23:59
Message-ID: 5435B97F.3010700@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/10/14 10:13, Andres Freund wrote:
> On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
>> Hmm. So the simple, non-table driven, calculation gives the same result as
>> using the lookup table using the reflected lookup code. That's expected; the
>> lookup method is supposed to be the same, just faster. However, using the
>> "normal" lookup code, but with a "reflected" lookup table, produces the same
>> result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But
>> AFAICS, that's an incorrect combination. You're supposed to the
>> non-reflected lookup table with the non-reflected lookup code; you can't mix
>> and match.
>>
>> As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
>> correspond to any bit-by-bit CRC variant and polynomial. My math skills are
>> not strong enough to determine what the consequences of that are. It might
>> still be a decent checksum. Or not. I couldn't tell if the good error
>> detection properties of the normal CRC-32 polynomial apply to our algorithm
>> or not.
> Additional interesting datapoints are that hstore and ltree contain the
> same tables - but properly use the reflected computation.
>
>> Thoughts?
> It clearly seems like a bad idea to continue with this - I don't think
> anybody here knows which guarantees this gives us.
>
> The question is how can we move away from this. There's unfortunately
> two places that embed PGC32 that are likely to prove problematic when
> fixing the algorithm: pg_trgm and tsgist both seem to include crc's in
> their logic in a persistent way. I think we should provide
> INIT/COMP/FIN_PG32 using the current algorithm for these.
>
> If we're switching to a saner computation, we should imo also switch to
> a better polynom - CRC-32C has better error detection capabilities than
> CRC32 and is available in hardware. As we're paying the price pf
> breaking compat anyway...
>
> Arguably we could also say that given that there's been little evident
> problems with the borked computation we could also switch to a much
> faster hash instead of continuing to use crc...
>
> Greetings,
>
> Andres Freund
>
Could a 64 bit variant of some kind be useful as an option - in addition
to a correct 32 bit?

As most people have 64 bit processors and storage is less constrained
now-a-days, as well as we tend to store much larger chunks of data.

Cheers,
Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-10-08 22:24:51 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Peter Geoghegan 2014-10-08 22:18:46 Re: Promise index tuples for UPSERT