Re: Hash index todo list item

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Raney <twraney(at)comcast(dot)net>
Cc: Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash index todo list item
Date: 2007-09-28 18:48:07
Message-ID: 200709281848.l8SIm8O22044@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Tom Raney wrote:
> We are pleased to announce an upcoming patch to the hash index code
> which improves build time and index size, based on this item in the
> TODO list:
> During index creation, pre-sort the tuples to improve build speed
> http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php
>
> Using our implementation, build times and index sizes are
> comparable with btree index build times and index sizes.
> For example, for a particular 12 million row relation, the
> 8.2.4 release requires over 2.8 hours to build the hash index.
> With our patch, the index is built in 80 seconds.
> Here is the link for a graph, showing a comparative analysis of
> btree and hash index builds and describing the benchmark data.
> http://web.cecs.pdx.edu/~raneyt/pg/
>
> We are currently cleaning up the patch and will submit it asap.
>
> Regards,
> Shreya Bhargava <shreya_bhargav(at)yahoo(dot)com>
> Tom Raney <twraney(at)comcast(dot)net>
>
>
> Kenneth Marshall wrote:
> > Dear PostgreSQL Hackers:
> >
> > After following the hackers mailing list for quite a while,
> > I am going to start investigating what will need to be done
> > to improve hash index performance. Below are the pieces of
> > this project that I am currently considering:
> >
> > 1. Characterize the current hash index implementation against
> > the BTree index, with a focus on space utilization and
> > lookup performance against a collection of test data. This
> > will give a baseline performance test to evaluate the impact
> > of changes. I initially do not plan to bench the hash creation
> > process since my initial focus will be on lookup performance.
> >
> > 2. Evaluate the performance of different hash index implementations
> > and/or changes to the current implementation. My current plan is
> > to keep the implementation as simple as possible and still provide
> > the desired performance. Several hash index suggestions deal with
> > changing the layout of the keys on a page to improve lookup
> > performance, including reducing the bucket size to a fraction of
> > a page or only storing the hash value on the page, instead of
> > the index value itself. My goal in this phase is to produce one
> > or more versions with better performance than the current BTree.
> >
> > 3. Look at build time and concurrency issues with the addition of
> > some additional tests to the test bed. (1)
> >
> > 4. Repeat as needed.
> >
> > This is the rough plan. Does anyone see anything critical that
> > is missing at this point? Please send me any suggestions for test
> > data and various performance test ideas, since I will be working
> > on that first.
> >
> > Regards,
> > Ken Marshall
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-09-28 19:31:05 Re: Enforcing database encoding and locale match
Previous Message Kevin Grittner 2007-09-28 18:46:16 Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)