Quick Links

Why hash indexes suck

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Dann Corbit" <DCorbit(at)connx(dot)com>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Why hash indexes suck
Date:	2004-06-05 20:03:39
Message-ID:	499.1086465819@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

"Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> There seems to be something seriously defective with hash indexes in old
> versions of PostgreSQL.

They still suck; I'm not aware of any situation where I'd recommend hash
over btree indexes in Postgres. I think we have fixed the hash indexes'
deadlock problems as of 7.4, but there's still no real performance
advantage.

I just had an epiphany as to the probable reason why the performance sucks.
It's this: the hash bucket size is the same as the page size (ie, 8K).

This means that if you have only one or a few items per bucket, the
information density is awful, and you lose big on I/O requirements
compared to a btree index. On the other hand, if you have enough
items per bucket to make the storage density competitive, you will
be doing linear searches through dozens if not hundreds of items that
are all in the same bucket, and you lose on CPU time (compared to btree
which can do binary search to find an item within a page).

It would probably be interesting to look into making the hash bucket
size be just a fraction of a page, with the intent of having no more
than a couple dozen items per bucket. I'm not sure what the
implications would be for intra-page storage management or index locking
conventions, but offhand it seems like there wouldn't be any
insurmountable problems.

I'm not planning on doing this myself, just throwing it out as a
possible TODO item for anyone who's convinced that hash indexes ought
to work better than they do.

regards, tom lane

In response to

Re: TimeOf(Subselects|Joins)FromLargeTables? at 2004-06-04 20:28:28 from Dann Corbit

Responses

Re: Why hash indexes suck at 2004-06-05 20:15:25 from Sailesh Krishnamurthy
Re: Re: We have got a serious problem with pg_clog/WAL synchronization at 2004-08-12 13:21:17 from Kenneth Marshall

Browse pgsql-general by date

	From	Date	Subject
Next Message	Sailesh Krishnamurthy	2004-06-05 20:15:25	Re: Why hash indexes suck
Previous Message	CSN	2004-06-05 17:38:03	timestamp outside of bounds

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Garamond	2004-06-05 20:11:02	Re: [HACKERS] Not 7.5, but 8.0 ?
Previous Message	Andrew Dunstan	2004-06-05 19:51:01	Re: Official Freeze Date for 7.5: July 1st, 2004