Neil Conway <neilc(at)samurai(dot)com> writes:
> The patch does two things: (1) change hash indexes to only store the
> key's hash value, not the entire key (2) store index elements within a
> hash bucket in order of hash key and search for matches via binary
> search. #1 is definitely a win in some in some circumstances (e.g.
> indexing large fields or types with expensive equality operators), but
> those aren't the common case. I'm surprised that #2 is not a more
> noticeable improvement...
It occurs to me that change #1 makes it cheaper to skip over index
entries that are in the same bucket but don't have the exact same hash
code; but it makes it significantly more expensive to skip over entries
that have the same hash code but aren't really equal to the key being
sought (since you have to visit the heap to find out they aren't equal).
Maybe your test workload had enough occurrences of the latter case to
balance out the win from the former.
I think it would be worth investigating a variant in which the index
stores both the hash code and the key value. This allows cheap
elimination of non-matching hash codes, and binary sorting of index
entries, without adding any extra trips to the heap. The downside is
that it makes the index larger so you incur more I/O there --- so this
might not be a win either.
> One possibility would be to provide an optional implementation of #1,
> perhaps via an alternate index operator class. That way people could
> select it in those situations in which it is worth using.
I think it would definitely be a good idea to make the lossy behavior
regards, tom lane
In response to
pgsql-patches by date
|Next:||From: Robert Treat||Date: 2005-05-28 16:51:23|
|Subject: Re: psql backslash consistency|
|Previous:||From: Tom Lane||Date: 2005-05-28 15:12:59|
|Subject: Re: psql backslash consistency |