Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Christopher Petrilli <petrilli(at)gmail(dot)com>, Ying Lu <ying_lu(at)cs(dot)concordia(dot)ca>, pgsql-general(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL
Date: 2005-05-10 05:29:48
Message-ID: 428046CC.4090601@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Tom Lane wrote:
> I have a gut reaction against that: it makes hash indexes fundamentally
> subservient to btrees.

I wouldn't say "subservient" -- if there is no ordering defined for the
index key, we just do a linear scan.

> However: what about storing the things in hashcode order? Ordering uint32s
> doesn't seem like any big conceptual problem.

Hmm, my memory of the hash code is a bit fuzzy. Do I understand correctly?

- we only use some of the bits in the hash to map from the hash of a key
to its bucket

- therefore within a bucket, we can still distinguish most of the
non-equal tuples from one another by comparing their full hash values

- if we keep the entries in a bucket (or page, I guess -- per earlier
mail) sorted by full hash value, we can use that to perform a binary search

Sounds like a good idea to me. How likely is it that the hash index will
be sufficiently large that we're using most of the bits in the hash just
to map hash values to buckets, so that the binary search won't be very
effective? (At this point many of the distinct keys in each bucket will
be full-on hash collisions, although sorting by the key values
themselves would still be effective.)

> I think that efficient implementation of this would require explicitly
> storing the hash code for each index entry, which we don't do now, but
> it seems justifiable on multiple grounds --- besides this point, the
> search could avoid doing the data-type-specific comparison if the hash
> code isn't equal.

Another benefit is that it would speed up page splits -- there would be
no need to rehash all the keys in a bucket when doing the split.

-Neil

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Marek Lewczuk 2005-05-10 05:41:58 Re: [PHP] Any experiance with PostgreSQL and SQLRelay
Previous Message Tom Lane 2005-05-10 04:54:48 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2005-05-10 06:12:17 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL
Previous Message Tom Lane 2005-05-10 04:54:48 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL