Re: Index AM change proposals, redux

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index AM change proposals, redux
Date: 2008-04-24 16:21:35
Message-ID: 1209054095.4259.1598.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2008-04-24 at 11:43 -0400, Bruce Momjian wrote:

> > Index compression is possible in many ways, depending upon the
> > situation. All of the following sound similar at a high level, but each
> > covers a different use case.
> >
> > * For Long, Similar data e.g. Text we can use Prefix Compression
> > We still store one pointer per row, but we reduce the size of the index
> > by reducing the size of the key values. This requires us to reach inside
> > datatypes, so isn't a very general solution but is probably an important
> > one in the future for Text.
> >
> > * For Unique/nearly-Unique indexes we can use Range Compression
> > We reduce the size of the index by holding one index pointer per range
> > of values, thus removing both keys and pointers. It's more efficient
> > than prefix compression and isn't datatype-dependant.
> >
> > * For Highly Non-Unique Data we can use Duplicate Compression
> > The latter is the technique used by Bitmap Indexes. Efficient, but not
> > useful for unique/nearly-unique data
> >
> > * Multi-Column Leading Value Compression - if you have a multi-column
> > index, then leading columns are usually duplicated between rows inserted
> > at the same time. Using an on-block dictionary we can remove duplicates.
> > Only useful for multi-column indexes, possibly overlapping/contained
> > subset of the GIT use case.
>
> Should any of these others be TODOs?

Possibly, though I think we have the priority items covered now.

Greg has passed comment on this thread of the difficulties inherent with
approach 1.

I've previously argued that technique 1 can be handled much better by
having "implied functional index use", e.g. an index can be defined on
SUBSTR(col, 10) rather than on the whole column, yet still used
automatically without needing to mention SUBSTR.
Link: Hackers 11 Jul 2006.

There's a variety of ways to improve multi-column indexes beyond what we
currently support, though those haven't ever been discussed on hackers
to my knowledge.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Decibel! 2008-04-24 16:24:29 Re: Index AM change proposals, redux
Previous Message Tom Lane 2008-04-24 16:21:33 Re: Is this TODO item done?