| From: | Ilyasov Ian <ianilyasov(at)outlook(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | need clarification about hash_bytes() non-determinitstic behaviour between Little Endian and Big Endian |
| Date: | 2026-06-10 14:54:52 |
| Message-ID: | AS1PR01MB9347D394AC87A0C56FD8E3D2CD1A2@AS1PR01MB9347.eurprd01.prod.exchangelabs.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello everyone!
Recently I've been looking onto hashfn.c and faced a different output when looking on local
variables using GDB and on custom example using hash_bytes(). The example is attached and
was compiled and ran on Big Endian machine like:
gcc -Wall -Wextra -DWORDS_BIGENDIAN -g -O0 test_hash.c -o test_hash
./test_hash test
before final a=316197320 b=2658358868 c=2658358868
3358672099
./test_hash testtesttest
before final a=3395140076 b=3735912863 c=4252500541
2256767852
Littile Endian:
gcc -Wall -Wextra -g -O0 test_hash.c -o test_hash
./test_hash test
before final a=317111240 b=2658358868 c=2658358868
1771415073
./test_hash testtesttest
before final a=572913213 b=3185033534 c=3535459743
547154463
However, the output will be the same if the input bytes
are a palindrome.
Big Endian:
./test_hash deed
before final a=47758264 b=2658358868 c=2658358868
1406051429
Little Endian:
./test_hash deed
before final a=47758264 b=2658358868 c=2658358868
1406051429
After looking inside hash_bytes() I've noticed what was the reason of this.
When the function goes inside word-aligned branch there is the same += operation
for an 'a' variable:
/* Code path for aligned source data */
const uint32 *ka = (const uint32 *) k;
...
#ifdef WORDS_BIGENDIAN
...
case 4:
a += ka[0];
break;
#else /* !WORDS_BIGENDIAN */
...
case 4:
a += ka[0];
break;
...
#endif /* WORDS_BIGENDIAN */
And in my example ka[0] represents 'test' bytes fit inside 32 bits.
But as endian is different 'a' gets different value after this operation.
And this is why palindromes make hash_bytes() return the same value.
However, if provided example is compiled without -DWORDS_BIGENDIAN,
hash_bytes() will return the same value on Big Endian as Little Endian would.
So my question is it necessary for hash_bytes() to return the same result on any endianness
or am I missing some logic under #ifdef WORDS_BIGENDIAN?
Kind regards,
Ian Ilyasov.
| Attachment | Content-Type | Size |
|---|---|---|
| test_hash.c | text/x-csrc | 9.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-06-10 14:55:21 | Re: Heads Up: cirrus-ci is shutting down June 1st |
| Previous Message | Andres Freund | 2026-06-10 14:36:22 | Re: GetBufferDescriptor() being called for local buffers from MarkBufferDirtyHint() |