Re: Questions about indexes?

From: Curt Sampson <cjs(at)cynic(dot)net>
To: Ryan Bradetich <rbradetich(at)uswest(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Questions about indexes?
Date: 2003-02-17 15:43:59
Message-ID: Pine.NEB.4.51.0302180021120.997@angelic-vtfw.cvpn.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 16 Feb 2003, Ryan Bradetich wrote:

> I am not sure why all the data is duplicated in the index ...

Well, you have to have the full key in the index, or how would you know,
when you look at a particular index item, if it actually matches what
you're searching for?

MS SQL server does have an interesting option that would help you a lot
in this case: clustered indexes. A table may have a single clustered
index, and each leaf node of the index stores not just the key but
actually the entire row. Thus, in a case like yours, you'd store the row
only once, not twice.

Without thinking too hard about it (my usual mode of operation on this
list :-)) this could probably be implemented in postgresql. But I don't
think it would be entirely trivial, and your case is unusual enough
that I very much doubt whether it would be worth implementing to fix
that alone. It would also offer the advantage that any lookup using the
clustered index would save fetching the heap page after that as well,
but it's hard to say if the savings would be worth the work.

> Since my only requirement is that the rows be unique, I have developed a
> custom MD5 function in C, and created an index on the MD5 hash of the
> concatanation of all the fields.

Well, that won't guarantee uniqueness, since it's perfectly possible
to have two different rows hash to the same value. (If that weren't
possible, your hash would have to contain as much information as the row
itself, and your space savings wouldn't be nearly so dramatic.)

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-02-17 16:24:41 Re: client_encoding directive is ignored in
Previous Message Anuradha Ratnaweera 2003-02-17 15:43:35 Group by count() and indexes