Re: hash index work

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: hash index work
Date: 2005-06-25 00:03:08
Message-ID: 200506250003.j5P038621441@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches


I have added these TODO items based on Neil's ideas:

* Consider sorting hash buckets so entries can be found using a binary
search, rather than a linear scan
* In hash indexes, consider storing the hash value with or instead
of the key itself

However, Neil's tests don't show much of a win. Should I keep these
items on the TODO list?

---------------------------------------------------------------------------

Tom Lane wrote:
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > The patch does two things: (1) change hash indexes to only store the
> > key's hash value, not the entire key (2) store index elements within a
> > hash bucket in order of hash key and search for matches via binary
> > search. #1 is definitely a win in some in some circumstances (e.g.
> > indexing large fields or types with expensive equality operators), but
> > those aren't the common case. I'm surprised that #2 is not a more
> > noticeable improvement...
>
> It occurs to me that change #1 makes it cheaper to skip over index
> entries that are in the same bucket but don't have the exact same hash
> code; but it makes it significantly more expensive to skip over entries
> that have the same hash code but aren't really equal to the key being
> sought (since you have to visit the heap to find out they aren't equal).
> Maybe your test workload had enough occurrences of the latter case to
> balance out the win from the former.
>
> I think it would be worth investigating a variant in which the index
> stores both the hash code and the key value. This allows cheap
> elimination of non-matching hash codes, and binary sorting of index
> entries, without adding any extra trips to the heap. The downside is
> that it makes the index larger so you incur more I/O there --- so this
> might not be a win either.
>
> > One possibility would be to provide an optional implementation of #1,
> > perhaps via an alternate index operator class. That way people could
> > select it in those situations in which it is worth using.
>
> I think it would definitely be a good idea to make the lossy behavior
> optional.
>
> regards, tom lane
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2005-06-25 01:50:23 Re: [SQL] ARRAY() returning NULL instead of ARRAY[]
Previous Message Bob 2005-06-24 20:19:32 Re: PL/pgSQL Debugger Support