Re: [PATCH]-hash index improving

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Xiao Meng <mx(dot)cogito(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH]-hash index improving
Date: 2008-07-17 21:01:36
Message-ID: 20080717210135.GN337@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 17, 2008 at 04:24:28PM -0400, Jonah H. Harris wrote:
> On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx(dot)cogito(at)gmail(dot)com> wrote:
> > The patch store hash code only in the index tuple.
> > It based on Neil Conway's patch with an old version of PostgreSQL.
> > It passes the regression test but I didn't test the performance yet.
> > Anyone interested can make a performance test;-)
> > You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> > original implementation.
> > It's a preliminary implementation and I'm looking for input here.
> > Hope to hear from you.
>
> I've spent some time today performing tests similar to those mentioned
> here (http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php)
>
> Using a word list of 2650024 unique words (maximum length is 118
> bytes), build times are still high, but I'm not really seeing any
> performance improvements over b-tree. I haven't profiled it yet, but
> my test is as follows:
>
> - Created the dict table
> - Loaded the dict table
> - Counted the records in the dict table
> - Created the index
> - Shutdown the database
> - Randomly selected 200 entries from the word list and built a file
> full of (SELECT * FROM dict WHERE word = '<word>') queries using them.
> - Cleared out the kernel cache
> - Started the database
> - Ran the query file
>
> The result of this is between 5-10ms improvement in the overall
> execution time of all 200 queries. The time-per-query is practically
> unnoticeable. As this is in the range of noise, methinks there's a
> larger problem with hash indexes. I haven't looked heavily into their
> implementation, but do you any of you know of any major design flaws?
>
Jonah,

Thank you for running these tests. I was trying to reproduce my initial
tests on the original system to make it more apples to apples, but the
latest release needs more resources semaphore-wise than the 8.2 and
to fix it on Solaris 8 I will need a reboot. Would you mind posting
the timings for the hash_only index versus the hash_value versus the
btree index for the same test. Also, what is the on-disk size of all
three indexes? This will allow us to figure out the bucket/page load
or fill-factor for each scenario.

The basic implementation looked reasonable. I will take a look at
the patch this evening.

Regards,
Ken

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-07-17 21:03:48 Re: autovacuum crash due to null pointer
Previous Message Jonah H. Harris 2008-07-17 20:24:28 Re: [PATCH]-hash index improving