Re: Hash Indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-09-21 12:29:59
Message-ID: CA+TgmoYF3O5=MYJ8HYH-g5RuKKX3BD0yGXDP7vU5JiUamwgOjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 20, 2016 at 7:55 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> If it turns out that it has little benefit, then we don't really need
>> to step up our user education. People can just keep using btree like
>
> The big problem is people coming from other databases and assuming our
> hash indexes have the same benefits over btree that exist in some other
> database software. The 9.5 warning at least helps with that.

I'd be curious what benefits people expect to get. For example, I
searched for "Oracle hash indexes" using Google and found this page:

http://logicalread.solarwinds.com/oracle-11g-hash-indexes-mc02/

It implies that their hash indexes are actually clustered indexes;
that is, the table data is physically organized into contiguous chunks
by hash bucket. Also, they can't split buckets on the fly. I think
the DB2 implementation is similar. So our hash indexes, even once we
add write-ahead logging and better concurrency, will be somewhat
different from those products. However, I'm not actually sure how
widely-used those index types are. I wonder if people who use hash
indexes in PostgreSQL are even likely to be familiar with those
technologies, and what expectations they might have.

For PostgreSQL, I expect the benefits of improving hash indexes to be
(1) slightly better raw performance for equality comparisons and (2)
better concurrency. The details aren't very clear at this stage. We
know that write performance is bad right now, even with Amit's
patches, but that's without the kill_prior_tuple optimization which is
probably extremely important but which has never been implemented for
hash indexes. Read performance is good, but there are still further
optimizations that haven't been done there, too, so it may be even
better by the time Amit gets done working in this area.

Of course, if we want to implement clustered indexes, that's going to
require significant changes to the heap format ... or the ability to
support multiple heap storage formats. I'm not opposed to that, but I
think it makes sense to fix the existing implementation first.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-09-21 12:43:44 Re: pageinspect: Hash index support
Previous Message Jesper Pedersen 2016-09-21 12:27:06 Re: pageinspect: Hash index support