Skip site navigation (1) Skip section navigation (2)

hash index work

From: Neil Conway <neilc(at)samurai(dot)com>
To: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: hash index work
Date: 2005-05-13 05:13:48
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-patches
This patch implements two changes to hash indexes:

- rather than storing the values of the index keys, we just store their 
hash value instead. The hash opclasses have been marked "lossy" so that 
the executor will recheck the scan qualifier to catch hash collisions.

- within a page of a hash bucket, the entries are stored sorted by hash 
value. When doing a hash index scan, we can then binary search within a 
page rather than using a linear search.

Unfortunately, I haven't been able to measure any performance 
improvement from either of these changes :-\

I've run tests on two types of tables: int4 keys and relatively short 
textual keys (I don't think there's much point testing longer keys: the 
patches should obviously be a win there, so I'm concerned about speeding 
up the common case). In both cases, the difference in index scan 
performance isn't significant: it's about the same with and without the 
patches. The indexes have been on tables with 1 million rows (of int4) 
and 400,000 rows (of text). I would test with larger tables, but 
creating hash indexes is so slow that even these sizes are painful 
enough. Since indexes of this size should be cached in memory the 
reduction in I/O for the text case isn't being measured (the index is 
about 30% smaller than it is when we store the full text value), but 
even so I would have expected the binary search to produce noticeable 

Perhaps I've made a coding error that has pessimized the performance 
somehow (although nothing obvious shows up in profiles), or perhaps the 
linear scan of the pages of the hash bucket isn't a performance problem 
in the first place, at least for types with a cheap equality operator. 
Some oprofile data seems to support this -- for example, in the int4 
case, we spend less than 0.5% of the time in _hash_next and children, 
and only 1.8% of the runtime in hashgetmulti and children (with the 
vanilla CVS HEAD code).


Attachment: hash_index_opt-10.patch
Description: text/x-patch (35.9 KB)


pgsql-patches by date

Next:From: Neil ConwayDate: 2005-05-13 06:40:39
Subject: Re: CSV consecutive newline bug
Previous:From: Alvaro HerreraDate: 2005-05-13 03:06:59
Subject: Refactoring in lock.c

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group