Simon Riggs wrote:
> > > Many users would be very interested if we could significantly reduce the
> > > size of the main index on their largest tables.
> > Yes, basically GIT allows index compression for clustered tables, and
> > stated that way it is clear it would be a big feature if we had it for
> > 8.4.
> > > I would at least like to see clustered indexes acknowledged as a TODO
> > > item, so we keep the door open for a future implementation based around
> > > the basic concept of GIT.
> > Yes, it is already a TODO:
> > * Consider compressing indexes by storing key values duplicated in
> > several rows as a single index entry
> That sounds similar, but definitely does not describe what GIT does.
> GIT holds one index pointer for a *range* of values held on one block.
> e.g. if we have a table with values 1-200 in it then GIT would store
> *one* index pointer for all 200 rows, even though the values are all
> That property makes it very different from the TODO item stated, since
> GIT can work very well on Unique indexes, which clearly do not have
> duplicated keys.
Agreed, TODO updated:
* Consider smaller indexes that record a range of values per heap page,
rather than having one index entry for every heap row
This is useful if the heap is clustered by the indexed values.
> I'd prefer the term "Clustered Indexes" rather than the name GIT, which
> is also a slang insult.
I try to avoid technical terms which might not be familiar, but instead
describe exactly what is requested.
> Index compression is possible in many ways, depending upon the
> situation. All of the following sound similar at a high level, but each
> covers a different use case.
> * For Long, Similar data e.g. Text we can use Prefix Compression
> We still store one pointer per row, but we reduce the size of the index
> by reducing the size of the key values. This requires us to reach inside
> datatypes, so isn't a very general solution but is probably an important
> one in the future for Text.
> * For Unique/nearly-Unique indexes we can use Range Compression
> We reduce the size of the index by holding one index pointer per range
> of values, thus removing both keys and pointers. It's more efficient
> than prefix compression and isn't datatype-dependant.
> * For Highly Non-Unique Data we can use Duplicate Compression
> The latter is the technique used by Bitmap Indexes. Efficient, but not
> useful for unique/nearly-unique data
> * Multi-Column Leading Value Compression - if you have a multi-column
> index, then leading columns are usually duplicated between rows inserted
> at the same time. Using an on-block dictionary we can remove duplicates.
> Only useful for multi-column indexes, possibly overlapping/contained
> subset of the GIT use case.
Should any of these others be TODOs?
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
+ If your life is a hard drive, Christ can be your backup. +
In response to
pgsql-hackers by date
|Next:||From: Bruce Momjian||Date: 2008-04-24 15:50:44|
|Subject: Re: Proposed patch - psql wraps at window width|
|Previous:||From: Bruce Momjian||Date: 2008-04-24 15:38:46|
|Subject: Is this TODO item done?|