Re: hash index improving v3

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Xiao Meng <mx(dot)cogito(at)gmail(dot)com>, Zdenek Kotala <Zdenek(dot)Kotala(at)sun(dot)com>, pgsql-patches(at)postgresql(dot)org, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: hash index improving v3
Date: 2008-09-11 15:24:38
Message-ID: 20080911152437.GM6714@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Wed, Sep 10, 2008 at 10:17:31PM -0600, Alex Hunsaker wrote:
> On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> > On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> >> On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
> >>> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> >>> > I think that the glacial speed for generating a big hash index is
> >>> > the same problem that the original code faced.
> >>>
> >>> Yeah sorry, I was not saying it was a new problem with the patch. Err
> >>> at least not trying to :) *Both* of them had been running at 18+ (I
> >>> finally killed them sometime Sunday or around +32 hours...)
> >>>
> >>> > It would be useful to have an equivalent test for the hash-only
> >>> > index without the modified int8 hash function, since that would
> >>> > be more representative of its performance. The collision rates
> >>> > that I was observing in my tests of the old and new mix() functions
> >>> > was about 2 * (1/10000) of what you test generated. You could just
> >>> > test against the integers between 1 and 2000000.
> >>>
> >>> Sure but then its pretty much just a general test of patch vs no
> >>> patch. i.e. How do we measure how much longer collisions take when
> >>> the new patch makes things faster? That's what I was trying to
> >>> measure... Though I apologize I don't think that was clearly stated
> >>> anywhere...
> >>
> >> Right, I agree that we need to benchmark the collision processing
> >> time difference. I am not certain that two data points is useful
> >> information. There are 469 collisions with our current hash function
> >> on the integers from 1 to 2000000. What about testing the performance
> >> at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,...
> >> Unless you adjust the fill calculation for the CREATE INDEX, I would
> >> stop once the time to create the index spikes. It might also be useful
> >> to see if a CLUSTER affects the performance as well. What do you think
> >> of that strategy?
> >
> > Not sure it will be a good benchmark of collision processing. Then
> > again you seem to have studied the hash algo closer than me. Ill go
> > see about doing this. Stay tuned.
>
> Assuming I understood you correctly, And I probably didn't this does
> not work very well because you max out at 27,006 values before you get
> this error:
> ERROR: index row size 8152 exceeds hash maximum 8144
> HINT: Values larger than a buffer page cannot be indexed.
>
> So is a power-of-2 multiple of 500 not simply:
> x = 500;
> while(1)
> {
> print x;
> x *= 2;
> }
>
> ?
>
Alex,

I meant to check the performance with increasing numbers of collisions,
not increasing size of the hashed item. In other words, something like
this:

for ($coll=500; $i<=1000000; $i=$i*2) {
for ($i=0; $i<=1000000; $i++) {
hash(int8 $i);
}
# add the appropriate number of collisions, distributed evenly to
# minimize the packing overrun problem
for ($dup=0; $dup<=$coll; $dup++) {
hash(int8 MAX_INT + $dup * 1000000/$coll);
}
}

Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-09-11 15:34:33 Re: pg_regress inputdir
Previous Message Heikki Linnakangas 2008-09-11 15:17:57 Re: Synchronous Log Shipping Replication

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-09-11 17:34:55 Re: [PATCHES] to_date() validation
Previous Message Peter Eisentraut 2008-09-11 12:39:01 Re: still alive?