Re: Hash support for arrays

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash support for arrays
Date: 2010-10-31 02:12:24
Message-ID: AANLkTimkytUv9F7=bWkFCqz=CPyYzbP3c9idJXY6zmn8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 30, 2010 at 1:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hmm, you mean use hash_any on the sequence of hash values?  Interesting
> idea; but hash_any doesn't look very amenable to being invoked
> incrementally, and I don't think I want to construct a temp array with
> enough space for the hashes of all the items in the input array ...

I had rather hoped without evidence that it would be easy enough to
use incrementally. Looking now I see your point; it doesn't lend
itself to that at all.

I suppose you could use crc where our interface does allow incremental use.

I'm not really familiar enough with the math to know whether CRC is
any better than your XOR mixing. I suspect they're pretty similar. I
suppose if you have arrays of various lengths containing repetitions
of a single value than the non-CRC would produce a hash of 0 for all
arrays with a length that's a multiple of 32. I'm not sure what CRC
would do.

Oh, one question that occurred to me you didn't mention including the
bounds of the array or the null bitmap in the hash function. I suppose
it's correct with or without them (or in the case of the null bitmap
another way to put it is the choice of the hash value to represent
null is arbitrary).

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-10-31 02:28:01 Re: Hash support for arrays
Previous Message Tom Lane 2010-10-30 22:59:30 Re: ALTER OBJECT any_name SET SCHEMA name