[GSoC08]some detail plan of improving hash index

From: "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: [GSoC08]some detail plan of improving hash index
Date: 2008-05-16 02:42:05
Message-ID: ded849dd0805151942j5b66d480u3f06aceae2b464af@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, hackers.

I'm reading the source codes of hash and reviewing Neil's old patch of
improving hash index.
Here is some detail plan. I'm trying to adjust Neil's patch to the current
version of PostgreSQL first. I'm not quite familar with the code yet, so
please make some comment.

* Phase 1. Just store hash value instead of hash keys

First, define a macro to make it optional.

Second, add a new function _hash_form_item to construct IndexTuple with hash
code to replace index_form_tuple uaed in hash access method. It seems easy
since we did'nt need to deal with TOAST.

Third, modify _hash_checkqual. We can first compare the hash value; if it's
the same, we compare the real key value.
Also, HashScanOpaqueData adds an element hashso_sk_hash to hold scan key's
hash value to support scan function.

Finally, it seems the system catalog pg_amop also need to be modified.
In Neil's patch, he set the amopreqcheck to be True.
In the documents, it means index hit must be rechecked in the document. But
I'm not so clear. Does it just means we need to recheck the value when use
_hash_chechqual?

* Phase 2. Sort the hash value when insert into the bucket and use binary
search when scan
Add a function _hash_binsearch to support binary search in a bucket;
It involved in all functions when we need to search, insert and delete.

* Phase 3. When it's necessary, store the real value.
When we insert a new index tuple , for example tp , to a bucket, we can
check whether there's the same hash code.
If there's already an index tuple with such a hash code, we store both the
hash code and real key of tp.
Alternatively, we add a flag to represent the tuple stores a real value or
just hash code. It seems a little complex.

Phase 1 seems extremely easy. I'm trying to do it first.
Additionally, I need a benchmark to test the performance. It seems there's
some tools list in http://wiki.postgresql.org/wiki/Performances_QA_testing .
Any advice?

--
Have a good day;-)
Best Regards,
Xiao Meng

━━━━━━━━━━━━━━━━━━━━━━━
Data and Knowledge Engineering Research Center
Harbin Institute of Technology, China
Gtalk: mx(dot)cogito(at)gmail(dot)com
MSN: cnEnder(at)live(dot)com
<http://xiaomeng.yo2.cn>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2008-05-16 04:33:05 Re: deadlock while doing VACUUM and DROP
Previous Message Andrew Chernow 2008-05-16 00:38:06 Re: libpq object hooks