Re: Hash support for arrays

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash support for arrays
Date: 2010-11-02 20:46:18
Message-ID: 20101102204618.GW27429@aart.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 02, 2010 at 04:42:19PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Tue, Nov 2, 2010 at 11:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Really? I think "I don't understand when this fails" isn't obviously
> >> better than being able to predict when it fails ...
>
> > Isn't that the whole point of hash functions? Collisions are
> > inevitable, but you want them to be unpredictable. If you want a hash
> > function with predictable collision behavior, just has the first
> > element. It'll be highly predictable AND wicked fast.
>
> That seems like a rather poor straw man, since it suffers from exactly
> the defect I'm complaining about, namely failing to consider all the
> input values equally.
>
> >> I'd be happier about this approach if there were some actual theory
> >> behind it ... maybe there's some analysis out there, but the one link
> >> that was given was just about entirely unconvincing.
>
> > I think it's from Knuth, though unfortunately I don't have a copy to
> > check. There are probably better algorithms out there, but this one's
> > pretty simple.
>
> I don't see anything in Knuth suggesting a multiplier of 31. His
> recommendation for a multiplier, if you're going to use multiplicative
> hashing, is wordsize/phi (phi being the golden ratio) ... and he also
> wants you to keep the high order not the low order bits of the product.
>
> However, this is largely beside the point, because that theory, as well
> as the Java code you're arguing from, has to do with the initial hashing
> of a raw sequence of input items. Not with combining some existing hash
> values. The rotate-and-xor method I suggested for that is borrowed
> exactly from section 6.4 of Knuth (page 512, in the first edition of
> volume 3).
>
> regards, tom lane
>

Given that our hash implimentation mixes the input data well (It does.
I tested it.) then a simple rotate-and-xor method is all that should
be needed to maintain all of the needed information. The original
hash function has done the heavy lifting in this case.

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sam Mason 2010-11-02 21:26:22 Re: Hash support for arrays
Previous Message Tom Lane 2010-11-02 20:42:19 Re: Hash support for arrays