Re: [HACKERS] Re: Multi field hash indexes

From: ocie(at)paracel(dot)com
To: hannu(at)trust(dot)ee (Hannu Krosing)
Cc: ocie(at)paracel(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Re: Multi field hash indexes
Date: 1998-03-17 21:07:11
Message-ID: 9803172107.AA02280@dolomite.paracel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hannu Krosing wrote:
>
> ocie(at)paracel(dot)com wrote:
>
> > Hannu Krosing wrote:
> > >
> > > The trick is to hash each field separately and then have a concatenation
> > >
> > > of the hash values.
> > >
> > > so assuming that for fields (a,b,c) values (120, 'friday', 3.1415927)
> > > hash
> > > to 'aa', 'bb', 'cc' the hash value of the whole tuple will be 'aabbcc'
> > >
> > > then the index can be used not only when selecting = a,b,c but also
> > > when
> > > selecting on _any_ of the fields in this index. For example when
> > > selecting
> > > for b='friday' one would examine only buckets where the middle is 'bb'
> > >
> >
> > HMM, this doesn't feel right. If I have an index on four int4s
> > a,b,c,d and I only know d, then it seems like searching for these in
> > the hash table could be as much work, or more work than a table scan.
>
> I was assuming that the hash table scan would be cheaper than the table scan

I think I see what you're getting at. The different parts are
separately hashed and then concatenated. I was thinking the other way
concatenate and THEN hash, which would spread the values all over the
table.

One problem with this (don't know how severe) is that a hash index on
a can answer 'select blah from t where a=10' by computing one hash
value and scaning the list at that point, whereas with an index on
(a,b), several hash values must be computed (one for each possible
resulting hash value of b).

>
> > > > Therefore, I would like to try implementing unique constraints on hash
> > >
> > > > indexes. Has this come up before? Are there any reasons not to
> > > > support this? As far as I understand, specifying an index method is
> > > > non standard (above and beyond standard) to begin with.
> > >
> > > While you are at it could you please comment if the GIST indexes are
> > > used or
> > > at least easily usable?
> >
> > What are GIST indexes?
>
> Some kind of generalised binary tree indexes that should make it easy to
> define additional indexing strategies.
>
> There is a directory access/gist in src/backend, and they are briefly
> mentioned in the PostgreSQL programmers guide.
>
> They seem to be offspring of some independent (of postgres) Berkeley project.

Interesting. I saw something in Dr Dobbs Journal (the last bastion of
programming magazines) about trinary trees. The idea is that there is
a less than branch for items less than the given node and one for
those greater and another for those equal. This can be used to index
items by using the less than/ greater than links to find the first
letter, then taking the 'equal' link and looking for the second
letter, etc. The problem is that these are not self balancing (like
red black trees or btrees), but maybe there is a way to make them so.

Ocie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-03-17 22:06:22 Re: [HACKERS] New patch to try...
Previous Message The Hermit Hacker 1998-03-17 20:35:41 New patch to try...