Re: Hash index todo list item

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-10 14:36:28
Message-ID: 46E5566C.3050106@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kenneth Marshall wrote:
> On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
>
>> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
>>
>>> ... This is the rough plan. Does anyone see anything critical that
>>> is missing at this point?
>>>
>> Sounds pretty good. Let me brain-dump one item on you: one thing that
>> hash currently has over btree is the ability to handle index items up
>> to a full page. Now, if you go with a scheme that only stores hash
>> codes and not the underlying data, you can not only handle that but
>> improve on it; but if you reduce the bucket size and don't remove the
>> data, it'd be a step backward. The idea I had about dealing with that
>> was to only reduce the size of primary buckets --- if it's necessary to
>> add overflow space to a bucket, the overflow units are still full pages.
>> So an index tuple up to a page in size can always be accommodated by
>> adding an overflow page to the bucket.
>>
>> Just a thought, but AFAIR it's not in the archives anywhere.
>>
>> regards, tom lane
>>
>>
> I was thinking about this some more, and it strikes me that we can
> keep the page size = bucket size = overflow size in the new scheme
> of storing the hash value in the index and still reduce the effective
> bucket size. Let's make the new size the (page size / 2^n) where n
> is chosen appropriately. Then we we store the value in the page we
> simply use n bits of the hash to determine which sub-piece of the
> page to use to actually store the value. We may need to add a multiplier
> to adjust the decision to split the page based on the mini-page. This
> should allow us to much more densely pack the pages and approach the
> 1 item per bucket. This could easily shrink the size of the index by
> a factor of 2^n. Let me know what you think.
>
My personal opinion is that something like this is required to take best
advantage of hashes. I didn't respond immediately because I don't have
advice on what the best implementation would look like.

I have also been wondering about the effect of a hash index that
includes multiple values to the same key (either a non-unique key, or
different tuples from the same key due to table updates). I started by
thinking that the maximum number of hash entries per hash bucket should
be kept to 2 or 3 to prevent reduction in performance to that of btree,
but I think this doesn't work if multiple tuples can have the same key.
Unless - the maps is hash value =1:1> index page =1:1> hash bucket =1:N>
hash value =1:M=> tuples. Guarantee than N is small (either <= 2 or <=4
depending on performance evaluation) by re-hashing if N ever becomes > 2
or > 4. Choose a sparse harsh. Let 1:M be indefinite? Also, optimize the
1:M=1:1 case, such that the value can be inline?

For most cases, I would think the above model would make it cheap to
determine if the key does not exist, as well as the 1:1 (hash value:key)
case requiring a single page lookup. Update performance would suffer,
but lookup should be faster than btree in these cases, as btree often
requires > 1 index page scan.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-09-10 14:37:26 Re: integrated tsearch doesn't work with non utf8 database
Previous Message Oleg Bartunov 2007-09-10 14:24:46 Re: Include Lists for Text Search