Re: how to allow integer overflow for calculating hash code of a string?

From: Haifeng Liu <liuhaifeng(at)live(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: how to allow integer overflow for calculating hash code of a string?
Date: 2012-10-30 11:48:23
Message-ID: BLU0-SMTP411243653B17C74CF444B95B9620@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I got a way which works fine: use bigint first, and then convert it to bit(32), and convert it to int4 at last.

declare i integer := 0;
declare h bigint := 0;
begin
for i in 1..length(str) loop
h = (h * 31 + ascii(substring(str, i, 1))) & 4294967295;
end loop;
return cast(cast(h as bit(32)) as int4);
end;

I did some tests which include both positive and negative results, seems all Ok.

On Sep 21, 2012, at 11:21 PM, Craig James <cjames(at)emolecules(dot)com> wrote:

> On Thu, Sep 20, 2012 at 7:56 PM, Haifeng Liu <liuhaifeng(at)live(dot)com> wrote:
>
> On Sep 20, 2012, at 10:34 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>
>>
>>
>> On Thu, Sep 20, 2012 at 1:55 AM, Haifeng Liu <liuhaifeng(at)live(dot)com> wrote:
>> I want to write a hash function which acts as String.hashCode() in java: hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How can I avoid this? I saw java do not care overflow of int, it just make the result negative.
>>
>>
>> Use the bitwise AND operator to mask the hash value with 0x3FFFFFF before each iteration:
>>
>> hash = (hash & 67108863) * 31 + s.charAt(i);
>>
>> Craig
>
> Thank you, I believe your solution is OK for a hash function, but I am aiming to create a hash function that is consistent with the one applications use. I know postgresql 9.1 has a hash function called hashtext, but I don't know what algorithm it use, and I also see that it's not recommended to relay on it. So I am trying to create a hash function which behaves exactly the same as java.lang.String.hashCode(). The later one may generate negative hash value. I guess when the number is overflowing, the part out of range will be ignored, and if the highest bit get 1, the hash value turn to negative value.
>
> You are probably doing something where you want the application and the database to implement the exact same function, but if you stick to the Java built-in function, you will only have control over one implementation of that function. What happens if someone working on Java changes the how the Java internals work?

That's not the trouble, just create a hash tool which copies the code of java.lang.String.hashCode() and use that tool instead will resolve this. The key is, I know and I can reimplement the algorithm.

>
> A better solution would be to implement your own hash function in Postgres, and then once you know exactly how it will work, re-implement it in Java with your own code. That's the only way you can ensure consistency between the two.
>
> Craig

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message jkells 2012-10-30 23:33:59 Public key for wxBase-2.8.12-1.el5.i386.rpm is not installed for pg_admin3
Previous Message Albe Laurenz 2012-10-29 11:32:36 Re: Database in psql