Quick Links

hash index work

From:	Neil Conway <neilc(at)samurai(dot)com>
To:	pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	hash index work
Date:	2005-05-13 05:13:48
Message-ID:	4284378C.2080205@samurai.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-patches

This patch implements two changes to hash indexes:

- rather than storing the values of the index keys, we just store their
hash value instead. The hash opclasses have been marked "lossy" so that
the executor will recheck the scan qualifier to catch hash collisions.

- within a page of a hash bucket, the entries are stored sorted by hash
value. When doing a hash index scan, we can then binary search within a
page rather than using a linear search.

Unfortunately, I haven't been able to measure any performance
improvement from either of these changes :-\

I've run tests on two types of tables: int4 keys and relatively short
textual keys (I don't think there's much point testing longer keys: the
patches should obviously be a win there, so I'm concerned about speeding
up the common case). In both cases, the difference in index scan
performance isn't significant: it's about the same with and without the
patches. The indexes have been on tables with 1 million rows (of int4)
and 400,000 rows (of text). I would test with larger tables, but
creating hash indexes is so slow that even these sizes are painful
enough. Since indexes of this size should be cached in memory the
reduction in I/O for the text case isn't being measured (the index is
about 30% smaller than it is when we store the full text value), but
even so I would have expected the binary search to produce noticeable
gains...

Perhaps I've made a coding error that has pessimized the performance
somehow (although nothing obvious shows up in profiles), or perhaps the
linear scan of the pages of the hash bucket isn't a performance problem
in the first place, at least for types with a cheap equality operator.
Some oprofile data seems to support this -- for example, in the int4
case, we spend less than 0.5% of the time in _hash_next and children,
and only 1.8% of the runtime in hashgetmulti and children (with the
vanilla CVS HEAD code).

-Neil

Attachment	Content-Type	Size
hash_index_opt-10.patch	text/x-patch	35.9 KB

Responses

Re: hash index work at 2005-05-28 02:42:28 from Bruce Momjian

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Neil Conway	2005-05-13 06:40:39	Re: CSV consecutive newline bug
Previous Message	Alvaro Herrera	2005-05-13 03:06:59	Refactoring in lock.c