Re: Hash Indexes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-10-03 17:00:35
Message-ID: CAMkU=1xcDMWCOLV6z9R7YTCHGousLu85M5Vw=Vbv9RBvLL_Yfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 29, 2016 at 5:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Sep 29, 2016 at 8:07 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> > On Wed, Sep 28, 2016 at 8:06 PM, Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> >> On 2016-09-28 15:04:30 -0400, Robert Haas wrote:
> >>> Andres already
> >>> stated that he things working on btree-over-hash would be more
> >>> beneficial than fixing hash, but at this point it seems like he's the
> >>> only one who takes that position.
> >>
> >> Note that I did *NOT* take that position. I was saying that I think we
> >> should evaluate whether that's not a better approach, doing some simple
> >> performance comparisons.
> >
> > I, for one, agree with this position.
>
> Well, I, for one, find it frustrating. It seems pretty unhelpful to
> bring this up only after the code has already been written. The first
> post on this thread was on May 10th. The first version of the patch
> was posted on June 16th. This position was first articulated on
> September 15th.
>
> But, by all means, please feel free to do the performance comparison
> and post the results. I'd be curious to see them myself.
>

I've done a simple comparison using pgbench's default transaction, in which
all the primary keys have been dropped and replaced with indexes of either
hash or btree type, alternating over many rounds.

I run 'pgbench -c16 -j16 -T 900 -M prepared' on an 8 core machine with a
scale of 40. All the data fits in RAM, but not in shared_buffers (128MB).

I find a 4% improvement for hash indexes over btree indexes, 9324.744
vs 9727.766. The difference is significant at p-value of 1.9e-9.

The four versions of hash indexes (HEAD, concurrent, wal, cache, applied
cumulatively) have no statistically significant difference in performance
from each other.

I certainly don't see how btree-over-hash-over-integer could be better than
direct btree-over-integer.

I think I don't see improvement in hash performance with the concurrent and
cache patches because I don't have enough cores to get to the contention
that those patches are targeted at. But since the concurrent patch is a
prerequisite to the wal patch, that is enough to justify it even without a
demonstrated performance boost. A 4% gain is not astonishing, but is nice
to have provided we can get it without giving up crash safety.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-10-03 17:28:40 Re: Hash Indexes
Previous Message Robert Haas 2016-10-03 16:55:05 Re: Tracking wait event for latches