Re: Hash Indexes

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-09-16 06:56:53
Message-ID: 0e0c6adb-2fd8-d0d5-6b87-71c7478bc753@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16/09/16 18:35, Amit Kapila wrote:

> On Thu, Sep 15, 2016 at 7:53 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Hi,
>>
>> On 2016-05-10 17:39:22 +0530, Amit Kapila wrote:
>>> For making hash indexes usable in production systems, we need to improve
>>> its concurrency and make them crash-safe by WAL logging them.
>> One earlier question about this is whether that is actually a worthwhile
>> goal. Are the speed and space benefits big enough in the general case?
>>
> I think there will surely by speed benefits w.r.t reads for larger
> indexes. All our measurements till now have shown that there is a
> benefit varying from 30~60% (for reads) with hash index as compare to
> btree, and I think it could be even more if we further increase the
> size of index. On space front, I have not done any detailed study, so
> it is not right to conclude anything, but it appears to me that if the
> index is on char/varchar column where size of key is 10 or 20 bytes,
> hash indexes should be beneficial as they store just hash-key.
>
>> Could those benefits not be achieved in a more maintainable manner by
>> adding a layer that uses a btree over hash(columns), and adds
>> appropriate rechecks after heap scans?
>>
> I don't think it can be faster for reads than using real hash index,
> but surely one can have that as a workaround.
>
>> Note that I'm not saying that hash indexes are not worthwhile, I'm just
>> doubtful that question has been explored sufficiently.
>>
> I think theoretically hash indexes are faster than btree considering
> logarithmic complexity (O(1) vs. O(logn)), also the results after
> recent optimizations indicate that hash indexes are faster than btree
> for equal to searches. I am not saying after the recent set of
> patches proposed for hash indexes they will be better in all kind of
> cases. It could be beneficial for cases where indexed columns are not
> updated heavily.
>
> I think one can definitely argue that we can some optimizations in
> btree and make them equivalent or better than hash indexes, but I am
> not sure if it is possible for all-kind of use-cases.
>

I think having the choice for a more equality optimized index design is
desirable. Now that they are wal logged they are first class citizens so
to speak. I suspect that there are a lot of further speed optimizations
that can be considered to tease out the best performance - now that the
basics of reliability have been sorted. I think this patch/set of
patches is/are important!

regards

Mark

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Gordiychuk 2016-09-16 07:11:48 Re: Stopping logical replication protocol
Previous Message Amit Kapila 2016-09-16 06:42:40 Re: Write Ahead Logging for Hash Indexes