Re: [PATCHES] updated hash functions for postgresql v1

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] updated hash functions for postgresql v1
Date: 2009-01-09 20:29:14
Message-ID: 20090109202914.GA4930@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 09, 2009 at 12:04:15PM -0800, Jeff Davis wrote:
> On Mon, 2008-12-22 at 13:47 -0600, Kenneth Marshall wrote:
> > Dear PostgreSQL developers,
> >
> > I am re-sending this to keep this last change to the
> > internal hash function on the radar.
> >
>
> Hi Ken,
>
> A few comments:
>
> 1. New patch with very minor changes attached.
>
> 2. I reverted the change you made to indices.sgml. We still don't use
> WAL for hash indexes, and in my opinion we should continue to discourage
> their use until we do use WAL. We can add back in the comment that hash
> indexes are suitable for large keys if we have some results to show
> that.
>
> 3. There was a regression test failure in union.sql because the ordering
> of the results was different. I updated the regression test.
>
> 4. Hash functions affect a lot more than hash indexes, so I ran through
> a variety of tests that use a HashAggregate plan. Test setup and results
> are attached. These results show no difference between the old and the
> new code (about 0.1% better).
>
> 5. The hash index build time shows some improvement. The new code won in
> every instance in which a there were a lot of duplicates in the table
> (100 distinct values, 50K of each) by around 5%.
>
> The new code appeared to be the same or slightly worse in the case of
> hash index builds with few duplicates (1000000 distinct values, 5 of
> each). The difference was about 1% worse, which is probably just noise.
>
> Note: I'm no expert on hash functions. Take all of my tests with a grain
> of salt.
>
> I would feel a little better if I saw at least one test that showed
> better performance of the new code on a reasonable-looking distribution
> of data. The hash index build that you showed only took a second or two
> -- it would be nice to see a test that lasted at least a minute.
>
> Regards,
> Jeff Davis
>
>

Jeff,

Thanks for the review. I would not really expect any differences in hash
index build times other than normal noise variances. The most definitive
benchmark that I have seen was done with my original patch submission
in March posted by Luke of Greenplum:

"We just applied this and saw a 5 percent speedup on a hash aggregation
query with four columns in a 'group by' clause run against a single
TPC-H table."

I wonder if they would be willing to re-run their test? Thanks again.

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2009-01-09 20:58:14 Re: Improving compressibility of WAL files
Previous Message Jeff Davis 2009-01-09 20:04:15 Re: [PATCHES] updated hash functions for postgresql v1