Re: hash index improving v3

From: "Alex Hunsaker" <badalex(at)gmail(dot)com>
To: "Kenneth Marshall" <ktm(at)rice(dot)edu>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>, "Zdenek Kotala" <Zdenek(dot)Kotala(at)sun(dot)com>, pgsql-patches(at)postgresql(dot)org, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: hash index improving v3
Date: 2008-09-11 03:49:25
Message-ID: 34d269d40809102049y2599a381kf409c0b5f18b6170@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
>> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
>> > I think that the glacial speed for generating a big hash index is
>> > the same problem that the original code faced.
>>
>> Yeah sorry, I was not saying it was a new problem with the patch. Err
>> at least not trying to :) *Both* of them had been running at 18+ (I
>> finally killed them sometime Sunday or around +32 hours...)
>>
>> > It would be useful to have an equivalent test for the hash-only
>> > index without the modified int8 hash function, since that would
>> > be more representative of its performance. The collision rates
>> > that I was observing in my tests of the old and new mix() functions
>> > was about 2 * (1/10000) of what you test generated. You could just
>> > test against the integers between 1 and 2000000.
>>
>> Sure but then its pretty much just a general test of patch vs no
>> patch. i.e. How do we measure how much longer collisions take when
>> the new patch makes things faster? That's what I was trying to
>> measure... Though I apologize I don't think that was clearly stated
>> anywhere...
>
> Right, I agree that we need to benchmark the collision processing
> time difference. I am not certain that two data points is useful
> information. There are 469 collisions with our current hash function
> on the integers from 1 to 2000000. What about testing the performance
> at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,...
> Unless you adjust the fill calculation for the CREATE INDEX, I would
> stop once the time to create the index spikes. It might also be useful
> to see if a CLUSTER affects the performance as well. What do you think
> of that strategy?

Not sure it will be a good benchmark of collision processing. Then
again you seem to have studied the hash algo closer than me. Ill go
see about doing this. Stay tuned.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Hunsaker 2008-09-11 04:17:31 Re: hash index improving v3
Previous Message Alex Hunsaker 2008-09-11 03:45:24 Re: hash index improving v3

Browse pgsql-patches by date

  From Date Subject
Next Message Alex Hunsaker 2008-09-11 04:17:31 Re: hash index improving v3
Previous Message Alex Hunsaker 2008-09-11 03:45:24 Re: hash index improving v3