Skip site navigation (1) Skip section navigation (2)

Re: hash index improving v3

From: "Alex Hunsaker" <badalex(at)gmail(dot)com>
To: "Kenneth Marshall" <ktm(at)rice(dot)edu>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Xiao Meng" <mx(dot)cogito(at)gmail(dot)com>, "Zdenek Kotala" <Zdenek(dot)Kotala(at)sun(dot)com>, pgsql-patches(at)postgresql(dot)org, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: hash index improving v3
Date: 2008-09-23 01:57:16
Message-ID: 34d269d40809221857m265d27bek29abadf934e19660@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
On Fri, Sep 12, 2008 at 8:29 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote:
>> On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
>> > Alex,
>> >
>> > I meant to check the performance with increasing numbers of collisions,
>> > not increasing size of the hashed item. In other words, something like
>> > this:
>> >
>> > for ($coll=500; $i<=1000000; $i=$i*2) {
>> >  for ($i=0; $i<=1000000; $i++) {
>> >    hash(int8 $i);
>> >  }
>> >  # add the appropriate number of collisions, distributed evenly to
>> >  # minimize the packing overrun problem
>> >  for ($dup=0; $dup<=$coll; $dup++) {
>> >    hash(int8 MAX_INT + $dup * 1000000/$coll);
>> >  }
>> > }
>> >
>> > Ken
>>
>> *doh* right something like this...
>>
>> create or replace function create_test_hash() returns bool as $$
>> declare
>>     coll integer default 500;
>>     -- tweak this to where create index gets really slow
>>     max_coll integer default 1000000;
>> begin
>>     loop
>>         execute 'create table test_hash_'|| coll ||'(num int8);';
>>         execute 'insert into test_hash_'|| coll ||' (num) select n
>> from generate_series(0, '|| max_coll ||') as n;';
>>         execute 'insert into test_hash_'|| coll ||' (num) select
>> (n+4294967296) * '|| max_col ||'/'|| coll ||'::int from
>> generate_series(0, '|| coll ||') as n;';
>>
>>         coll := coll * 2;
>>
>>         exit when coll >= max_coll;
>>     end loop;
>>     return true;
>> end;
>> $$ language 'plpgsql';
>>
>> And then benchmark each table, and for extra credit cluster the table
>> on the index and benchmark that.
>>
>> Also obviously with the hashint8 which just ignores the top 32 bits.
>>
>> Right?
>>
> Yes, that is exactly right.
>
> Ken

Ok I finally found time to do this, In summary looks like v5 scales
about the same as cvs head when the collisions are spread evenly
(obviously not HEAD with the hash patch applied...).   I couldn't test
cluster because we can't cluster on hash indexes...

benchmark with 50,000,000 rows and 500 collisions:
index creation time:
head: 326116.647 ms
v5:     269509.026 ms

pgbench  -n -c1 -T600 -f q.sql hash
head: tps = 3226.605611
v5:      tps = 3412.688884 (excluding connections establishing)

50,000,000 rows and 32,768,000 collisions
index time:
head:  576600.967 ms
v5    : 487949.490 ms

pgbench -n -c1 -T500 -f q.sql hash
head: tps = 3105.270185
v5:     tps = 3382.25782

You can see each result from 500 all the way up to 32,768,000
collision in the attached result.out

Attached files:
create_test_hash.sql: function I used to create the test tables
result.out: output from bench.pl which shows the pgbench results and
the create index times
bench.pl: stupid little perl script to test pgbench each of the
created tables from create_test_hash.pl

Attachment: result.out
Description: application/octet-stream (13.0 KB)
Attachment: create_test_hash.sql
Description: application/octet-stream (661 bytes)
Attachment: bench.pl
Description: application/x-perl (1.5 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Greg Sabino MullaneDate: 2008-09-23 02:31:05
Subject: pg_type.h regression?
Previous:From: Bruce MomjianDate: 2008-09-23 01:51:16
Subject: Re: Proposed patch: make SQL interval-literal syntax work per spec

pgsql-patches by date

Next:From: Alex HunsakerDate: 2008-09-23 03:25:03
Subject: Re: hash index improving v3
Previous:From: Simon RiggsDate: 2008-09-22 22:06:01
Subject: Re: [PATCHES] Infrastructure changes for recovery

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group