On Tue, Nov 2, 2010 at 12:34 PM, Kevin Grittner
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> The goal is to make those hard to predict, though.
>>> Really? I think "I don't understand when this fails" isn't
>>> obviously better than being able to predict when it fails ...
>> Isn't that the whole point of hash functions? Collisions are
>> inevitable, but you want them to be unpredictable.
> Are you sure you're not confusing attributes of a good random number
> generator with hash generation? (They have much in common, but I've
> only seen concern with "hard to predict" when working on random
> number algorithms, as for financial audits or jury selection.)
>> If you want a hash function with predictable collision behavior,
>> just has the first element. It'll be highly predictable AND
>> wicked fast.
> You're confusing "unpredictable" with "widely distributed", I think.
> There's no reason that the hash value of an integer of the same size
> as the hash shouldn't be the integer itself, for example. It's hard
> to get more predictable than that, yet it works well for hash
Well, no, not really. For example, it may be that you have a hash
table with N buckets and values that of the form N+k, 2N+k, 3N+k, ....
and everything will collide.
If you do some mixing of the bits, that case is far less likely to
occur in practice.
See for example http://www.azillionmonkeys.com/qed/hash.html
> I know that for a hash of a character string, the total = (total *
> 31) + nextcharacter has been shown to be effective. That's hardly
> random or hard to predict, but it does tend to spread out the hash
> values well. Whether it is as good for arrays, I don't know. It
> seems like a reasonable place to start, though, since a character
> string is rather like an array of characters.
That was my thought.
>>> What concerns me about that is that it tends to push the bits to
>>> the left --- I think the effects of the earlier inputs are going
>>> to overflow out as you incorporate a lot of newer inputs. What
>>> you want is a scheme where every input item affects all the bits
>>> of the final result about equally, and my gut feeling is this
>>> doesn't provide that.
>> I don't think so. That would be a problem if you multiplied by an
>> even number. You can see that you don't get dups if you just add
>> in the same value over and over:
> Yeah, that 1 in the low order position ensures that the impact of
> any value which is ever added into the total is never entirely lost.
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Kevin Grittner||Date: 2010-11-02 20:25:16|
|Subject: Re: Hash support for arrays|
|Previous:||From: Robert Haas||Date: 2010-11-02 20:00:17|
|Subject: Re: ALTER TYPE recursion to typed tables|