Re: Index Tuple Compression Approach?

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: "Decibel!" <decibel(at)decibel(dot)org>, Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-15 05:51:28
Message-ID: 46C29460.8070002@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis wrote:
> On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
>> Isn't this what Grouped Index Tuples is?
>
> http://community.enterprisedb.com/git/git-readme.txt
>
> It looks like GIT is a little different.
>
> GIT actually stores a lower-bound key of a contiguous* range of keys
> that all point to the same page, and for each of those ranges stores a
> bitmap of page offsets. A search searches first for an exact match in
> the index, and failing that, looks to see if the previous index tuple
> happens to be one of these ranges.
>
> The algorithm Chris is talking about stores a set of tuple ids (which
> include page and offset) for each distinct key.

Right.

> Both could be helpful, although I don't think they can work together
> very well.

What Chris is suggesting is basically a special case of GIT, where all
the heap tuples represented by an index tuple have the same key. I was
actually thinking of adding a flag to index tuples to indicate that
special case in GIT. We could effectively do both.

> GIT has the disadvantage that it's "lossy". It doesn't even store every
> key in the index, so it can't be sure that the match actually is a
> match. Thus, it returns candidate matches. That also has implications
> for enforcing UNIQUE (although it's not impossible, according to the
> readme). However, GIT can be used effectively on an index that happens
> to be unique. GIT also assumes a tree structure, and makes no sense for
> a hash index, and makes no sense for a types without ordering. GIT's
> space savings is dependent on the clustering of the table.
>
> Chris's suggestion would work on a UNIQUE index, but would be no help at
> all, because there would be no duplicate keys to collapse. However, it
> could be used for non-tree indexes. The space savings for this strategy
> is dependent on how repetitive the keys are.

Right. I wasn't concerned about the case of a lot of duplicate keys,
because the bitmap index is more efficient for that. And also because
HOT should reduce the number of index tuples with duplicate keys
pointing to different versions of the same row.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-08-15 05:55:42 Re: CVS corruption/mistagging?
Previous Message Tom Lane 2007-08-15 05:17:46 Re: CVS corruption/mistagging?