Re: Why hash indexes suck

From: pgsql(at)mohawksoft(dot)com
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: sailesh(at)cs(dot)berkeley(dot)edu, "Dann Corbit" <dcorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why hash indexes suck
Date: 2004-06-06 14:02:06
Message-ID: 16596.24.91.171.78.1086530526.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> Sailesh Krishnamurthy <sailesh(at)cs(dot)berkeley(dot)edu> writes:
>> This is probably a crazy idea, but is it possible to organize the data
>> in a page of a hash bucket as a binary tree ?
>
> Only if you want to require a hash opclass to supply ordering operators,
> which sort of defeats the purpose I think. Hash is only supposed to
> need equality not ordering.
>

A btree is frequently used within the buckets of a hash table, expecially
if you expect to have a large number of items in each bucket.

If PostgreSQL could create a hash table index which is a single top level
hash table with each hash bucket being a btree index. You can eliminate a
number of btree searches by hashing, and then fall into btree performance
after the first hash lookup. The administrator should be able to gather
statistics about the population of the hash buckets and rehash if
performance begins to behave like a btree or the data is not distributed
evenly. Given proper selection of the initial number of buckets, a hash
table could blow btree out of the water. Given a poor selection of the
number of buckets, i.e. 1, a hash will behave no worse than a btree.

Also, it would be helpful to be able to specify a hash function during the
create or rehash, given a specific class of data, extraction of the random
elements can be more efficient and/or effective given knowledge of the
data. Think something like "bar codes," there are portions of the data
which are ususally the same and portions of the data which are usually
different. Focusing on the portions of the data which tend to be different
will generally provide a more evenly distributed hash.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruno Wolff III 2004-06-06 15:07:51 Re: row access
Previous Message Jeff Davis 2004-06-06 09:21:14 Re: [HACKERS] Slony-I goes BETA (possible bug)

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2004-06-06 17:32:12 Re: [HACKERS] Slony-I goes BETA (possible bug)
Previous Message Andreas Pflug 2004-06-06 09:34:01 pgsql 7.5 frontend changes wrapup