Re: Index AM change proposals, redux

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index AM change proposals, redux
Date: 2008-04-24 15:43:35
Message-ID: 200804241543.m3OFhZZ04664@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> > > Many users would be very interested if we could significantly reduce the
> > > size of the main index on their largest tables.
> >
> > Yes, basically GIT allows index compression for clustered tables, and
> > stated that way it is clear it would be a big feature if we had it for
> > 8.4.
>
> > > I would at least like to see clustered indexes acknowledged as a TODO
> > > item, so we keep the door open for a future implementation based around
> > > the basic concept of GIT.
> >
> > Yes, it is already a TODO:
> >
> > * Consider compressing indexes by storing key values duplicated in
> > several rows as a single index entry
>
> That sounds similar, but definitely does not describe what GIT does.
>
> GIT holds one index pointer for a *range* of values held on one block.
> e.g. if we have a table with values 1-200 in it then GIT would store
> *one* index pointer for all 200 rows, even though the values are all
> different.
>
> That property makes it very different from the TODO item stated, since
> GIT can work very well on Unique indexes, which clearly do not have
> duplicated keys.

Agreed, TODO updated:

* Consider smaller indexes that record a range of values per heap page,
rather than having one index entry for every heap row

This is useful if the heap is clustered by the indexed values.
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00341.php
http://archives.postgresql.org/pgsql-hackers/2007-02/msg01264.php
http://archives.postgresql.org/pgsql-hackers/2007-03/msg00465.php
http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00014.php
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00487.php
http://archives.postgresql.org/pgsql-hackers/2008-04/msg01589.php

> I'd prefer the term "Clustered Indexes" rather than the name GIT, which
> is also a slang insult.

I try to avoid technical terms which might not be familiar, but instead
describe exactly what is requested.

> Index compression is possible in many ways, depending upon the
> situation. All of the following sound similar at a high level, but each
> covers a different use case.
>
> * For Long, Similar data e.g. Text we can use Prefix Compression
> We still store one pointer per row, but we reduce the size of the index
> by reducing the size of the key values. This requires us to reach inside
> datatypes, so isn't a very general solution but is probably an important
> one in the future for Text.
>
> * For Unique/nearly-Unique indexes we can use Range Compression
> We reduce the size of the index by holding one index pointer per range
> of values, thus removing both keys and pointers. It's more efficient
> than prefix compression and isn't datatype-dependant.
>
> * For Highly Non-Unique Data we can use Duplicate Compression
> The latter is the technique used by Bitmap Indexes. Efficient, but not
> useful for unique/nearly-unique data
>
> * Multi-Column Leading Value Compression - if you have a multi-column
> index, then leading columns are usually duplicated between rows inserted
> at the same time. Using an on-block dictionary we can remove duplicates.
> Only useful for multi-column indexes, possibly overlapping/contained
> subset of the GIT use case.

Should any of these others be TODOs?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-04-24 15:50:44 Re: Proposed patch - psql wraps at window width
Previous Message Bruce Momjian 2008-04-24 15:38:46 Is this TODO item done?