Re: Hash support for arrays

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash support for arrays
Date: 2010-11-03 04:22:40
Message-ID: 4CD0E390.4080708@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> It's possible that the multiply-by-31 method is as good as the
> rotate-and-xor method by that measure, or even better; but it's far from
> obvious that it's better. And I'm not convinced that the multiply
> method has a pedigree that should encourage me to take it on faith.

Short summary:
* some history of it's trivia follows
* (nothing here suggests it's better - just old and common and cheap)

Longer - some trivia regarding its pedigree:

It (or at least a *31 variant) seems to have a history of advocacy
going back to Chris Torek in 1990:
http://groups.google.com/group/comp.lang.c/browse_thread/thread/28c2095282f0c1b5/193be99e9836791b?q=#193be99e9836791b

X#define HASH(str, h, p) \
X for (p = str, h = 0; *p;) h = (h << 5) - h + *p++

and gets referred to in Usenet papers in the early 90's as well:
http://www.usenix.com/publications/library/proceedings/sa92/salz.pdf

Regarding "why the magic number 31 [or 33 which also often comes
up]" apparently the only thing magic about it is that it's an
odd number != 1. The rest of the odd numbers work about as well
according to this guy who tried to explain it:

http://svn.eu.apache.org/repos/asf/apr/apr/trunk/tables/apr_hash.c
* The magic of number 33, i.e. why it works better than many other
* constants, prime or not, has never been adequately explained by
* anyone. So I try an explanation: if one experimentally tests all
* multipliers between 1 and 256 (as I did while writing a low-level
* data structure library some time ago) one detects that even
* numbers are not useable at all. The remaining 128 odd numbers
* (except for the number 1) work more or less all equally well.
* They all distribute in an acceptable way and this way fill a hash
* table with an average percent of approx. 86%.
*
* If one compares the chi^2 values of the variants (see
* Bob Jenkins ``Hashing Frequently Asked Questions'' at
* http://burtleburtle.net/bob/hash/hashfaq.html for a description
* of chi^2), the number 33 not even has the best value. But the
* number 33 and a few other equally good numbers like 17, 31, 63,
* 127 and 129 have nevertheless a great advantage to the remaining
* numbers in the large set of possible multipliers: their multiply
* operation can be replaced by a faster operation based on just one
* shift plus either a single addition or subtraction operation. And
* because a hash function has to both distribute good _and_ has to
* be very fast to compute, those few numbers should be preferred.
...

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc Cousin 2010-11-03 08:03:21 Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
Previous Message Tom Lane 2010-11-02 22:52:40 Re: [PATCH] V3: Idle in transaction cancellation