Skip site navigation (1) Skip section navigation (2)

Re: Index AM change proposals, redux

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index AM change proposals, redux
Date: 2008-04-24 15:43:35
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Simon Riggs wrote:
> > > Many users would be very interested if we could significantly reduce the
> > > size of the main index on their largest tables.
> > 
> > Yes, basically GIT allows index compression for clustered tables, and
> > stated that way it is clear it would be a big feature if we had it for
> > 8.4.
> > > I would at least like to see clustered indexes acknowledged as a TODO
> > > item, so we keep the door open for a future implementation based around
> > > the basic concept of GIT.
> > 
> > Yes, it is already a TODO:
> > 
> > * Consider compressing indexes by storing key values duplicated in
> >   several rows as a single index entry
> That sounds similar, but definitely does not describe what GIT does.
> GIT holds one index pointer for a *range* of values held on one block.
> e.g. if we have a table with values 1-200 in it then GIT would store
> *one* index pointer for all 200 rows, even though the values are all
> different.
> That property makes it very different from the TODO item stated, since
> GIT can work very well on Unique indexes, which clearly do not have
> duplicated keys.

Agreed, TODO updated:

	* Consider smaller indexes that record a range of values per heap page,
	  rather than having one index entry for every heap row
	  This is useful if the heap is clustered by the indexed values.

> I'd prefer the term "Clustered Indexes" rather than the name GIT, which
> is also a slang insult.

I try to avoid technical terms which might not be familiar, but instead
describe exactly what is requested.

> Index compression is possible in many ways, depending upon the
> situation. All of the following sound similar at a high level, but each
> covers a different use case.
> * For Long, Similar data e.g. Text we can use Prefix Compression
> We still store one pointer per row, but we reduce the size of the index
> by reducing the size of the key values. This requires us to reach inside
> datatypes, so isn't a very general solution but is probably an important
> one in the future for Text. 
> * For Unique/nearly-Unique indexes we can use Range Compression
> We reduce the size of the index by holding one index pointer per range
> of values, thus removing both keys and pointers. It's more efficient
> than prefix compression and isn't datatype-dependant. 
> * For Highly Non-Unique Data we can use Duplicate Compression
> The latter is the technique used by Bitmap Indexes. Efficient, but not
> useful for unique/nearly-unique data
> * Multi-Column Leading Value Compression - if you have a multi-column
> index, then leading columns are usually duplicated between rows inserted
> at the same time. Using an on-block dictionary we can remove duplicates.
> Only useful for multi-column indexes, possibly overlapping/contained
> subset of the GIT use case.

Should any of these others be TODOs?

  Bruce Momjian  <bruce(at)momjian(dot)us>

  + If your life is a hard drive, Christ can be your backup. +

In response to


pgsql-hackers by date

Next:From: Bruce MomjianDate: 2008-04-24 15:50:44
Subject: Re: Proposed patch - psql wraps at window width
Previous:From: Bruce MomjianDate: 2008-04-24 15:38:46
Subject: Is this TODO item done?

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group