Quick Links

Re: hash index improving v3

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	Alex Hunsaker <badalex(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Xiao Meng <mx(dot)cogito(at)gmail(dot)com>, Zdenek Kotala <Zdenek(dot)Kotala(at)sun(dot)com>, pgsql-patches(at)postgresql(dot)org, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject:	Re: hash index improving v3
Date:	2008-09-09 13:48:39
Message-ID:	20080909134839.GC6714@it.is.rice.edu
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

On Sat, Sep 06, 2008 at 08:23:05PM -0600, Alex Hunsaker wrote:
> On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >For the convenience of anyone intending to test, here is an updated
> >patch against CVS HEAD that incorporates Alex's fix.
>
> Here are the results for a table containing 1 million entries that
> will generate hash collisions. It paints a bad picture for the patch
> but then again im not sure how relevant the issue is. For example
> yesterday I imported a table with 10 million collisions and the create
> index is still running (now at about ~18 hours). Maybe we should warn
> if there are lots of collisions when creating the index and suggest
> you use a btree? Anyway here are the results.
>
I think that the glacial speed for generating a big hash index is
the same problem that the original code faced. Because of the collisions
you are unable to achieve the correct packing that the code assumes.
This results in the splitting/copying of every page in the hash index,
a very slow proposition. I had suggested adding some additional parameters
like fillfactor to accomodate these sorts of situations. Since your test
cuts the effective fill by 2 because of the many collisions, you would
need to adjust that calculation to avoid the tremendous amount of random
I/O generated by that mis-assumption.

> ./pgbench -c1 -n -t10 -f bench_createindex.sql
> cvs head: tps = 0.002169
> v5 : tps = 0.002196
>
> pgbench -c1 -n -t1000 -f bench_bitmap.sql
> cvs head: tps = 24.011871
> v5: tps = 2.543123
>
> pgbench -c1 -n -t1000 -f bench_index.sql
> cvs head: tps = 51.614502
> v5: tps = 3.205542
>
> pgbench -c1 -n -t1000 -f bench_seqscan.sql
> cvs head: tps = 8.553318
> v5: tps = 9.836091
>
> Table created via:
> create table test_hash (num int8);
> ./hash | psql -c 'copy test_hash from stdin;'

It would be useful to have an equivalent test for the hash-only
index without the modified int8 hash function, since that would
be more representative of its performance. The collision rates
that I was observing in my tests of the old and new mix() functions
was about 2 * (1/10000) of what you test generated. You could just
test against the integers between 1 and 2000000.

Ken

In response to

Re: hash index improving v3 at 2008-09-07 02:23:05 from Alex Hunsaker

Responses

Re: hash index improving v3 at 2008-09-10 01:23:03 from Alex Hunsaker

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Markus Wanner	2008-09-09 13:55:37	Re: Synchronous Log Shipping Replication
Previous Message	Robert Haas	2008-09-09 13:47:46	Re: Common Table Expressions (WITH RECURSIVE) patch

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Ryan Bradetich	2008-09-09 15:14:59	Re: [PgFoundry] Unsigned Data Types [1 of 2]
Previous Message	Brendan Jurd	2008-09-09 12:46:16	Re: [PATCHES] to_date() validation