Skip site navigation (1) Skip section navigation (2)

[patch] gsoc, improving hash index v2

From: "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "Kenneth Marshall" <ktm(at)rice(dot)edu>
Subject: [patch] gsoc, improving hash index v2
Date: 2008-07-25 14:26:05
Message-ID: ded849dd0807250726s6c4cc895oabd8579c375a6538@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi, hackers.
I've post a hash patch in a previous thread
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00794.php
I do apologize for the bad readability  of previous patch. Thank you all for
your comments.
Here is a new patch which fixed some bugs in the previous one.
I post it here to get some feedback and further suggestion. Any comment is
welcome.
Changes since v1:
- fix bug that it crashed in _h_spool when test big data set
- adjust the target-fillfactor calculation in _hash_metapinit
- remove the HASHVALUE_ONLY macro
- replace _create_hash_desc with _get_hash_desc  to get a hard-coded hash
index tuple.
- replace index_getattr with _hash_get_datum to get the hash key datum and
avoid too many calls to _get_hash_desc and index_getattr

Here is what I intend to do.
Todo:
- get  the statistics of block access i/o
- write unit tests using pgunitest to test the following:
  (Josh Berkus suggested in this thread
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00535.php )
bulk load, both COPY and INSERT
single-row updates, inserts and deletes
batch update by key
batch update by other index
batch delete by key
batch delete by other index
concurrent index updates (64 connections insert/deleting concurrently)

I makes some simple test mentioned here (
http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php)
I'll make some test on bigger data set later.
using a word list of  3628800 unique words
The table size is 139MB.
Index      BuildTime    IndexSize
----        ----            ----
btree        51961.123 ms   93MB
hash        411069.264 ms   2048MB
hash-patch   36288.931 ms   128MB

dict=# SELECT * from hash-dict where word = '0234567891' ;
    word
------------
 0234567891
(1 row)

Time: 33.960 ms
dict=# SELECT * from btree-dict where word = '0234567891' ;
    word
------------
 0234567891
(1 row)

Time: 1.662 ms

dict=# SELECT * from hash2-dict where word = '0234567891' ;
    word
------------
 0234567891
(1 row)

Time: 1.457 ms

At last, there is a problem I encounter.
I'm confused by the function _hash_checkqual.
IMHO, the index tuple only store one column here and  key->sk_attno should
always be 1 here.
And scanKeySize should be 1 since we didn't support multi-column hash yet.
Do I make some misunderstanding?
/*
 * _hash_checkqual -- does the index tuple satisfy the scan conditions?
 */
bool
_hash_checkqual(IndexScanDesc scan, IndexTuple itup)
{
    TupleDesc    tupdesc = RelationGetDescr(scan->indexRelation);
    ScanKey        key = scan->keyData;
    int            scanKeySize = scan->numberOfKeys;

    IncrIndexProcessed();

    while (scanKeySize > 0)
    {
        Datum        datum;
        bool        isNull;
        Datum        test;

        datum = index_getattr(itup,
                              key->sk_attno,
                              tupdesc,
                              &isNull);

        /* assume sk_func is strict */
        if (isNull)
            return false;
        if (key->sk_flags & SK_ISNULL)
            return false;

        test = FunctionCall2(&key->sk_func, datum, key->sk_argument);

        if (!DatumGetBool(test))
            return false;

        key++;
        scanKeySize--;
    }

    return true;
}

Hope to hear from you.
-- 
Best Regards,
Xiao Meng

DKERC, Harbin Institute of Technology, China
Gtalk: mx(dot)cogito(at)gmail(dot)com
MSN: cnEnder(at)live(dot)com
http://xiaomeng.yo2.cn

Attachment: hash-v2.patch
Description: text/x-diff (18.3 KB)

Responses

pgsql-hackers by date

Next:From: Alvaro HerreraDate: 2008-07-25 14:42:03
Subject: Re: Do we really want to migrate plproxy and citext intoPG core distribution?
Previous:From: Andrew DunstanDate: 2008-07-25 14:17:33
Subject: Re: Do we really want to migrate plproxy and citext into PG core distribution?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group