Re: Combining hash values

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Combining hash values
Date: 2016-07-31 16:34:12
Message-ID: 21850.1469982852@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> 2. I suspect that this algorithm for combining hashes is weak, and
> could amplify weaknesses in the hash functions feeding it.

Very possibly, but ...

> Compare
> Boost's hash_combine, which does some more work before XORing:

> seed ^= hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);

I can't help being reminded of Knuth's story about he tried to invent
the world's best random number generator, and was disappointed when
it almost immediately converged to a short repeating sequence. If
there's any actual theoretical basis to the above, I'd be interested
to see it. But as-is, the use of addition rather than XOR looks fishy,
and the way the old seed is shifted around looks more likely to result
in cancellation than anything else.

> That constant approximates the golden ratio (as a fraction of the 32
> bit hash space), and it also appears in our hash_any and hash_uint32
> functions.

I think it's just somebody's idea of a magic random number. Your link
> [3] http://cstheory.stackexchange.com/questions/9639/how-did-knuth-derive-a
provides some reasons to think it might be a good choice for a very
specific application, but this is not that application --- in particular,
it doesn't involve multiplying by that constant.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shay Rojansky 2016-07-31 16:38:18 Re: Slowness of extended protocol
Previous Message Tom Lane 2016-07-31 16:18:50 Re: Hash indexes and effective_cache_size in CREATE INDEX documentation