Re: What is wrong with hashed index usage?

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Neil Conway" <nconway(at)klamath(dot)dyndns(dot)org>, <mloftis(at)wgops(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: What is wrong with hashed index usage?
Date: 2002-06-21 20:20:18
Message-ID: D90A5A6C612A39408103E6ECDD77B82906F47C@voyager.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman(at)candle(dot)pha(dot)pa(dot)us]
> Sent: Friday, June 21, 2002 9:52 AM
> To: Tom Lane
> Cc: Neil Conway; mloftis(at)wgops(dot)com; Dann Corbit;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > I remember three problems: build time, index size, and
> concurrency
> > > problems. I was wondering about the equal key case
> myself, and assumed
> > > hash may be a win there, but with the concurrency
> problems, is that even
> > > possible?
> >
> > Sure. Many-equal-keys are a problem for btree whether you have any
> > concurrency or not.
> >
> > > OK, I have reworded it. Is that better?
> >
> > It's better, but you've still discarded the original's
> explicit mention
> > of concurrency problems. Why do you want to remove information?
>
> OK, concurrency added. How is that?
>
> >
> > > How about an elog(NOTICE) for hash use?
> >
> > I don't think that's appropriate.
>
> I was thinking of this during CREATE INDEX ... hash:
>
> NOTICE: Hash index use is discouraged. See the CREATE INDEX
> reference page for more information.
>
> Does anyone else like/dislike that?

I think it might be OK temporarily, to show that there is some work that
needs done. When hashed indexes are fixed, the notice should be
removed.

I have not looked at the hash code. Here is a strategy (off the top of
my head) that seems that it should work:

Use Bob Jenkins' 64 bit generic hash from here (totally free for use and
fast as blazes):
http://burtleburtle.net/bob/hash/index.html

Specifically:
http://burtleburtle.net/bob/c/lookup8.c and routine: hash( k, length,
level)

Now, with a 64 bit hash, there is very tiny probability of a collision
(but you could have duplicate data).
The hash index would consist of nothing more than this:
[long long hash=64 bit hash code][unsigned nmatches=count of matching
hashes][array of {nmatches} pointers directly to the records with that
hash]

This is probably grotesqely oversimplified. But maybe it will spur an
indea in the person who writes the indexing code.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-06-21 20:31:17 Re: What is wrong with hashed index usage?
Previous Message Larry Rosenman 2002-06-21 20:13:55 Re: What is wrong with hashed index usage?