Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

From: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL
Date: 2005-05-10 18:12:45
Message-ID: 1115748765.4280f99d52dd5@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Quoting "Jim C. Nasby" <decibel(at)decibel(dot)org>:

> I'm not really familiar enough with hash indexes to know if this
> would
> work, but if the maximum bucket size was known you could use that to
> determine a maximum range of buckets to look at. In some cases, that
> range would include only one bucket, otherwise it would be a set of
> buckets. If you found a set of buckets, I think you could then just
> go
> to the specific one you need.
>
> If we assume that the maximum bucket size is one page it becomes
> more
> realistic to take an existing large bucket and split it into several
> smaller ones. This could be done on an update to the index page, or
> a
> background process could handle it.
>
> In any case, should this go on the TODO list?
>
> > Allowing a bucket size to be specified at CREATE INDEX doesn't seem
> out
> > of line though. We'd have to think up a scheme for
> index-AM-specific
> > index parameters ...
> --
> Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
> Give your computer some brain candy! www.distributed.net Team #1828

Google "dynamic hash" or "linear hash". It takes care of not needing to
have varying bucket sizes.

Hash indexes are useful if you ALWAYS require disk access; they behave
like worst-case random cache-thrash tests. That's probably why dbs have
gravitated toward tree indexes instead. On the other hand, there's more
(good) to hashing than initially meets the eye.

Dynamic multiway hashing has come a long way from just splicing the bits
together from multiple columns' hash values. If you can lay your hands
on Tim Merrett's old text "Relational Information Systems", it's an
eye-opener. Picture an efficient terabyte spreadsheet.

For one thing, unlike a btree, a multicolumn hash is symmetric: it
doesn't matter which column(s) you do not specify in a partial match.

For another, a multiway hash is useful for much lower selectivity than a
btree. I built such indexes for OLAP cubes, and some dimensions were
only 10 elements wide. At the point where btree indexing becomes worse
than seqscan, a multiway hash tells you which 10% of the disk to scan.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Brendan Jurd 2005-05-10 18:13:48 Re: Shorthand for foreign key indices
Previous Message Tom Lane 2005-05-10 18:07:00 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2005-05-10 18:13:11 Re: Prefetch
Previous Message Tom Lane 2005-05-10 18:07:00 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL