From: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | jff: checksum algorithm is not as intended |
Date: | 2021-08-30 00:18:33 |
Message-ID: | 7d018a5e73186a08f891e46fa25bdf18@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Good day.
Current checksum is not calculated in intended way and
has the flaw.
Single round function is written as:
#define CHECKSUM_COMP(checksum, value) do {\
uint32 __tmp = (checksum) ^ (value);\
(checksum) = __tmp * FNV_PRIME ^ (__tmp >> 17);\
} while (0)
And looks like it was intended to be
(checksum) = (__tmp * FNV_PRIME) ^ (__tmp >> 17);
At least original Florian Pflug suggestion were correctly written
in this way (but with shift 1):
https://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396%40phlo.org
But due to C operation precedence it is actually calculated as:
(checksum) = __tmp * (FNV_PRIME ^ (__tmp >> 17));
It has more longer collision chains and worse: it has 256 pathological
result slots of shape 0xXX000000 each has 517 collisions in average.
Totally 132352 __tmp values are collided into this 256 slots.
That is happens due to if top 16 bits are happens to be
0x0326 or 0x0327, then `FNV_PRIME ^ (__tmp >> 17) == 0x1000000`,
and then `__tmp * 0x1000000` keeps only lower 8 bits. That means,
9 bits masked by 0x0001ff00 are completely lost.
mix(0x03260001) == mix(0x03260101) = mix(0x0327aa01) == mix(0x0327ff01)
(where mix is a `__tmp` to `checksum` transformation)
regards,
Yura Sokolov
y(dot)sokolov(at)postgrespro(dot)ru
funny(dot)falcon(at)gmail(dot)com
PS. Test program in Crystal language is attached and output for current
CHECKSUM_COMP implementation and "correct" (intended).
Excuse me for Crystal, it is prettier to write for small compiled
scripts.
Attachment | Content-Type | Size |
---|---|---|
checksum_correct.out | text/plain | 479 bytes |
checksum_current.out | text/plain | 12.0 KB |
checksum.cr | text/x-c | 1.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2021-08-30 00:26:42 | Re: Write visibility map during CLUSTER/VACUUM FULL |
Previous Message | Pavel Stehule | 2021-08-29 20:46:47 | Re: Schema variables - new implementation for Postgres 15 |