need clarification about hash_bytes() non-determinitstic behaviour between Little Endian and Big Endian

From: Ilyasov Ian <ianilyasov(at)outlook(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: need clarification about hash_bytes() non-determinitstic behaviour between Little Endian and Big Endian
Date: 2026-06-10 14:54:52
Message-ID: AS1PR01MB9347D394AC87A0C56FD8E3D2CD1A2@AS1PR01MB9347.eurprd01.prod.exchangelabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello everyone!

Recently I've been looking onto hashfn.c and faced a different output when looking on local
variables using GDB and on custom example using hash_bytes(). The example is attached and
was compiled and ran on Big Endian machine like:
gcc -Wall -Wextra -DWORDS_BIGENDIAN -g -O0 test_hash.c -o test_hash

./test_hash test
before final a=316197320 b=2658358868 c=2658358868
3358672099

./test_hash testtesttest
before final a=3395140076 b=3735912863 c=4252500541
2256767852

Littile Endian:
gcc -Wall -Wextra -g -O0 test_hash.c -o test_hash

./test_hash test
before final a=317111240 b=2658358868 c=2658358868
1771415073

./test_hash testtesttest
before final a=572913213 b=3185033534 c=3535459743
547154463

However, the output will be the same if the input bytes
are a palindrome.
Big Endian:
./test_hash deed
before final a=47758264 b=2658358868 c=2658358868
1406051429

Little Endian:
./test_hash deed
before final a=47758264 b=2658358868 c=2658358868
1406051429

After looking inside hash_bytes() I've noticed what was the reason of this.
When the function goes inside word-aligned branch there is the same += operation
for an 'a' variable:

/* Code path for aligned source data */
const uint32 *ka = (const uint32 *) k;
...
#ifdef WORDS_BIGENDIAN
...
case 4:
a += ka[0];
break;
#else /* !WORDS_BIGENDIAN */
...
case 4:
a += ka[0];
break;
...
#endif /* WORDS_BIGENDIAN */

And in my example ka[0] represents 'test' bytes fit inside 32 bits.
But as endian is different 'a' gets different value after this operation.
And this is why palindromes make hash_bytes() return the same value.

However, if provided example is compiled without -DWORDS_BIGENDIAN,
hash_bytes() will return the same value on Big Endian as Little Endian would.

So my question is it necessary for hash_bytes() to return the same result on any endianness
or am I missing some logic under #ifdef WORDS_BIGENDIAN?

Kind regards,
Ian Ilyasov.

Attachment Content-Type Size
test_hash.c text/x-csrc 9.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-06-10 14:55:21 Re: Heads Up: cirrus-ci is shutting down June 1st
Previous Message Andres Freund 2026-06-10 14:36:22 Re: GetBufferDescriptor() being called for local buffers from MarkBufferDirtyHint()