Re: hash index improving v3

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Xiao Meng <mx(dot)cogito(at)gmail(dot)com>, Zdenek Kotala <Zdenek(dot)Kotala(at)sun(dot)com>, pgsql-patches(at)postgresql(dot)org, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: hash index improving v3
Date: 2008-09-09 13:48:39
Message-ID: 20080909134839.GC6714@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Sat, Sep 06, 2008 at 08:23:05PM -0600, Alex Hunsaker wrote:
> On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >For the convenience of anyone intending to test, here is an updated
> >patch against CVS HEAD that incorporates Alex's fix.
>
> Here are the results for a table containing 1 million entries that
> will generate hash collisions. It paints a bad picture for the patch
> but then again im not sure how relevant the issue is. For example
> yesterday I imported a table with 10 million collisions and the create
> index is still running (now at about ~18 hours). Maybe we should warn
> if there are lots of collisions when creating the index and suggest
> you use a btree? Anyway here are the results.
>
I think that the glacial speed for generating a big hash index is
the same problem that the original code faced. Because of the collisions
you are unable to achieve the correct packing that the code assumes.
This results in the splitting/copying of every page in the hash index,
a very slow proposition. I had suggested adding some additional parameters
like fillfactor to accomodate these sorts of situations. Since your test
cuts the effective fill by 2 because of the many collisions, you would
need to adjust that calculation to avoid the tremendous amount of random
I/O generated by that mis-assumption.

> ./pgbench -c1 -n -t10 -f bench_createindex.sql
> cvs head: tps = 0.002169
> v5 : tps = 0.002196
>
> pgbench -c1 -n -t1000 -f bench_bitmap.sql
> cvs head: tps = 24.011871
> v5: tps = 2.543123
>
> pgbench -c1 -n -t1000 -f bench_index.sql
> cvs head: tps = 51.614502
> v5: tps = 3.205542
>
> pgbench -c1 -n -t1000 -f bench_seqscan.sql
> cvs head: tps = 8.553318
> v5: tps = 9.836091
>
> Table created via:
> create table test_hash (num int8);
> ./hash | psql -c 'copy test_hash from stdin;'

It would be useful to have an equivalent test for the hash-only
index without the modified int8 hash function, since that would
be more representative of its performance. The collision rates
that I was observing in my tests of the old and new mix() functions
was about 2 * (1/10000) of what you test generated. You could just
test against the integers between 1 and 2000000.

Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2008-09-09 13:55:37 Re: Synchronous Log Shipping Replication
Previous Message Robert Haas 2008-09-09 13:47:46 Re: Common Table Expressions (WITH RECURSIVE) patch

Browse pgsql-patches by date

  From Date Subject
Next Message Ryan Bradetich 2008-09-09 15:14:59 Re: [PgFoundry] Unsigned Data Types [1 of 2]
Previous Message Brendan Jurd 2008-09-09 12:46:16 Re: [PATCHES] to_date() validation