Re: using hash index when BETWEEN is specified

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Asko Oja <ascoja(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: using hash index when BETWEEN is specified
Date: 2008-09-10 16:12:48
Message-ID: 29988.1221063168@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> I think it depends of ration of unique integer number in a table and
> numbers of requested interval, number distribution and total number of rows.

> For example if you have 10 distinct number and each has 100 occurrence
> then full scan is better (for between 1 and 5). But if each number
> occurs 100000x. Then using hash index should be effective.

I think this discussion is a complete waste of time. Hash indexes don't
win against btrees for single indexscans currently. Even if that ever
gets fixed, it's highly unlikely that they'd win for N separate
indexscans versus 1 indexscan, which is what a query rewrite of this
sort would produce. Remember that the btree will have the desired range
of keys stored adjacently, whereas in a hash they are almost certainly
in distinct buckets, and likely not even close-together buckets if the
hash function is doing its job well. So you really are talking about a
factor of N both in indexscan setup overhead and in I/O costs.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Hunsaker 2008-09-10 16:27:24 Re: hash index improving v3
Previous Message David Fetter 2008-09-10 16:05:52 Re: Keeping creation time of objects