Re: hash index work

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: hash index work
Date: 2005-05-28 02:42:28
Message-ID: 200505280242.j4S2gSn09485@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches


Neil, I have added these item to the TODO list. Do you plan on applying
this?

---------------------------------------------------------------------------

Neil Conway wrote:
> This patch implements two changes to hash indexes:
>
> - rather than storing the values of the index keys, we just store their
> hash value instead. The hash opclasses have been marked "lossy" so that
> the executor will recheck the scan qualifier to catch hash collisions.
>
> - within a page of a hash bucket, the entries are stored sorted by hash
> value. When doing a hash index scan, we can then binary search within a
> page rather than using a linear search.
>
> Unfortunately, I haven't been able to measure any performance
> improvement from either of these changes :-\
>
> I've run tests on two types of tables: int4 keys and relatively short
> textual keys (I don't think there's much point testing longer keys: the
> patches should obviously be a win there, so I'm concerned about speeding
> up the common case). In both cases, the difference in index scan
> performance isn't significant: it's about the same with and without the
> patches. The indexes have been on tables with 1 million rows (of int4)
> and 400,000 rows (of text). I would test with larger tables, but
> creating hash indexes is so slow that even these sizes are painful
> enough. Since indexes of this size should be cached in memory the
> reduction in I/O for the text case isn't being measured (the index is
> about 30% smaller than it is when we store the full text value), but
> even so I would have expected the binary search to produce noticeable
> gains...
>
> Perhaps I've made a coding error that has pessimized the performance
> somehow (although nothing obvious shows up in profiles), or perhaps the
> linear scan of the pages of the hash bucket isn't a performance problem
> in the first place, at least for types with a cheap equality operator.
> Some oprofile data seems to support this -- for example, in the int4
> case, we spend less than 0.5% of the time in _hash_next and children,
> and only 1.8% of the runtime in hashgetmulti and children (with the
> vanilla CVS HEAD code).
>
> -Neil

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2005-05-28 04:08:49 Re: [HACKERS] patches for items from TODO list
Previous Message Bruce Momjian 2005-05-28 02:35:13 Re: AllocSetReset improvement