| From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
|---|---|
| To: | Karen Hill <karen_hill22(at)yahoo(dot)com> |
| Cc: | pgsql-general(at)postgresql(dot)org |
| Subject: | Re: cyclical redundancy checksum algorithm(s)? |
| Date: | 2006-09-28 07:53:50 |
| Message-ID: | 451B7F8E.30303@sigaev.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
>> You sure that's actually what he said? A change in CRC proves the data
>> changed, but lack of a change does not prove it didn't.
>
> "To quickly determine if rows have changed, we rely on a cyclic
> redundancy checksum (CRC) algorithm. If the CRC is identical for the
>
>> "summary" functions, such as an MD5 hash. I wouldn't trust it at all
>> with a 32-bit CRC, and not much with a 64-bit CRC. Too much risk of
>> collision.
Small example of collisions for crc32:
0x38ee5531
Hundemarke
92294
0x59471e4f
raciner
tranchefiler
0x947bb6c0
Betriebsteile
4245
I had make a lot of work when choosing hash function for tsearch2. Also, I had
find that popular hash algorithms produce more collision for non-ascii
languages... CRC32 is more "smooth".
On dictionary with 332296 unique words CRC32 produces 11 collisions, perl's hash
function - 35, pgsql's hash_any - 12.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Teodor Sigaev | 2006-09-28 07:55:48 | Re: Full Text fuzzy search |
| Previous Message | Matthias.Pitzl | 2006-09-28 07:46:29 | Definition of return types for own functions? |