jff: checksum algorithm is not as intended

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: jff: checksum algorithm is not as intended
Date: 2021-08-30 00:18:33
Message-ID: 7d018a5e73186a08f891e46fa25bdf18@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Good day.

Current checksum is not calculated in intended way and
has the flaw.

Single round function is written as:

#define CHECKSUM_COMP(checksum, value) do {\
uint32 __tmp = (checksum) ^ (value);\
(checksum) = __tmp * FNV_PRIME ^ (__tmp >> 17);\
} while (0)

And looks like it was intended to be
(checksum) = (__tmp * FNV_PRIME) ^ (__tmp >> 17);

At least original Florian Pflug suggestion were correctly written
in this way (but with shift 1):
https://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396%40phlo.org

But due to C operation precedence it is actually calculated as:
(checksum) = __tmp * (FNV_PRIME ^ (__tmp >> 17));

It has more longer collision chains and worse: it has 256 pathological
result slots of shape 0xXX000000 each has 517 collisions in average.
Totally 132352 __tmp values are collided into this 256 slots.

That is happens due to if top 16 bits are happens to be
0x0326 or 0x0327, then `FNV_PRIME ^ (__tmp >> 17) == 0x1000000`,
and then `__tmp * 0x1000000` keeps only lower 8 bits. That means,
9 bits masked by 0x0001ff00 are completely lost.

mix(0x03260001) == mix(0x03260101) = mix(0x0327aa01) == mix(0x0327ff01)
(where mix is a `__tmp` to `checksum` transformation)

regards,
Yura Sokolov
y(dot)sokolov(at)postgrespro(dot)ru
funny(dot)falcon(at)gmail(dot)com

PS. Test program in Crystal language is attached and output for current
CHECKSUM_COMP implementation and "correct" (intended).
Excuse me for Crystal, it is prettier to write for small compiled
scripts.

Attachment Content-Type Size
checksum_correct.out text/plain 479 bytes
checksum_current.out text/plain 12.0 KB
checksum.cr text/x-c 1.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-08-30 00:26:42 Re: Write visibility map during CLUSTER/VACUUM FULL
Previous Message Pavel Stehule 2021-08-29 20:46:47 Re: Schema variables - new implementation for Postgres 15