Quick Links

Re: Combining hash values

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Combining hash values
Date:	2016-08-01 15:27:15
Message-ID:	6883.1470065235@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On that subject, while looking at hashfunc.c, I spotted that
> hashint8() has a very obvious deficiency, which causes disastrous
> performance with certain inputs:

Well, if you're trying to squeeze 64 bits into a 32-bit result, there
are always going to be collisions somewhere.

> I'd suggest using hash_uint32() for values that fit in a 32-bit
> integer and hash_any() otherwise.

Perhaps, but this'd break existing hash indexes. That might not be
a fatal objection, but if we're going to put users through that
I'd like to think a little bigger in terms of the benefits we get.
I've thought for some time that we needed to move to 64-bit hash function
results, because the size of problem that's reasonable to use a hash join
or hash aggregation for keeps increasing. Maybe we should do that and fix
hashint8 as a side effect.

regards, tom lane

In response to

Re: Combining hash values at 2016-08-01 11:24:15 from Dean Rasheed

Responses

Re: Combining hash values at 2016-08-01 22:19:22 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2016-08-01 15:49:41	New version numbering practices
Previous Message	Peter Eisentraut	2016-08-01 15:25:10	PostgreSQL 10 kick-off