Re: Hash index todo list item

From: Tom Raney <twraney(at)comcast(dot)net>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-21 00:12:45
Message-ID: 46F30C7D.6060402@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We are pleased to announce an upcoming patch to the hash index code
which improves build time and index size, based on this item in the
TODO list:
During index creation, pre-sort the tuples to improve build speed
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.
For example, for a particular 12 million row relation, the
8.2.4 release requires over 2.8 hours to build the hash index.
With our patch, the index is built in 80 seconds.
Here is the link for a graph, showing a comparative analysis of
btree and hash index builds and describing the benchmark data.
http://web.cecs.pdx.edu/~raneyt/pg/

We are currently cleaning up the patch and will submit it asap.

Regards,
Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com>
Tom Raney <twraney(at)comcast(dot)net>

Kenneth Marshall wrote:
> Dear PostgreSQL Hackers:
>
> After following the hackers mailing list for quite a while,
> I am going to start investigating what will need to be done
> to improve hash index performance. Below are the pieces of
> this project that I am currently considering:
>
> 1. Characterize the current hash index implementation against
> the BTree index, with a focus on space utilization and
> lookup performance against a collection of test data. This
> will give a baseline performance test to evaluate the impact
> of changes. I initially do not plan to bench the hash creation
> process since my initial focus will be on lookup performance.
>
> 2. Evaluate the performance of different hash index implementations
> and/or changes to the current implementation. My current plan is
> to keep the implementation as simple as possible and still provide
> the desired performance. Several hash index suggestions deal with
> changing the layout of the keys on a page to improve lookup
> performance, including reducing the bucket size to a fraction of
> a page or only storing the hash value on the page, instead of
> the index value itself. My goal in this phase is to produce one
> or more versions with better performance than the current BTree.
>
> 3. Look at build time and concurrency issues with the addition of
> some additional tests to the test bed. (1)
>
> 4. Repeat as needed.
>
> This is the rough plan. Does anyone see anything critical that
> is missing at this point? Please send me any suggestions for test
> data and various performance test ideas, since I will be working
> on that first.
>
> Regards,
> Ken Marshall
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-09-21 00:32:57 Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Previous Message Tom Lane 2007-09-20 23:27:54 Re: [COMMITTERS] pgsql: Silence Solaris compiler warnings, per buildfarm.