Quick Links

Re: What is wrong with hashed index usage?

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Michael Loftis <mloftis(at)wgops(dot)com>, Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What is wrong with hashed index usage?
Date:	2002-04-25 20:38:00
Message-ID:	200204252038.g3PKc0h15943@candle.pha.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Nice report. I think we should start thinking of hiding the hash option
from users, or warn them more forcefully, rather than hold it out as a
possible option for them.

People think hash is best for equals-only queries, and btree for others,
and we can now see this clearly isn't the case.

---------------------------------------------------------------------------

Tom Lane wrote:
> Michael Loftis <mloftis(at)wgops(dot)com> writes:
> > [ on hash vs btree indexing ]
> > Most of the time until the btree gets deep they are nearly equivalent.
> > When the tree ends up becoming many levels deep it can take longer to
> > walk than the hash.
>
> Maybe. I've just completed a simple benchmark of btree vs hash indexes
> as implemented in Postgres, and I can't see any advantage.
>
> Using current sources on Red Hat Linux 7.2, I built a simple test table
> containing one integer column, and filled it with 16 million random
> integers generated by int4(1000000000 * random()). With a btree index,
> "explain analyze select * from foo where f1 = 314888455" (matching a
> single row of the table) took about 22 msec on first try (nothing in
> cache), and subsequent repetitions about 0.11 msec. With a hash index,
> the first try took about 28 msec and repetitions about 0.15 msec.
> Moreover, the hash index was a whole lot bigger: main table size 674
> meg, btree 301 meg, hash 574 meg, which possibly offers part of the
> explanation for the greater access time.
>
> I would have tried a larger test case, but this one already taxed
> my patience: it took 36 hours to build the hash index (vs 19 minutes
> for the btree index). It looks like hash index build has an O(N^2)
> performance curve --- the thing had 100 meg of hash index built within
> an hour of starting, but got slower and slower after that.
>
> In short, lack of support for concurrent operations is hardly the
> worst problem with Postgres' hash indexes. If you wanna fix 'em,
> be my guest ... but I think I shall spend my time elsewhere.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Re: What is wrong with hashed index usage? at 2002-04-25 20:05:49 from Tom Lane

Responses

Re: What is wrong with hashed index usage? at 2002-04-25 21:04:44 from Neil Conway

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Vince Vielhaber	2002-04-25 21:04:30	Re: Vote totals for SET in aborted transaction
Previous Message	Bruce Momjian	2002-04-25 20:32:25	Re: Vote totals for SET in aborted transaction