Re: Index AM change proposals, redux

From: Decibel! <decibel(at)decibel(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index AM change proposals, redux
Date: 2008-04-24 16:24:29
Message-ID: 39CFF9DB-E0B5-4B69-B975-94FA120D5EEA@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Apr 24, 2008, at 10:43 AM, Bruce Momjian wrote:

Bruce asked if these should be TODOs...

>> Index compression is possible in many ways, depending upon the
>> situation. All of the following sound similar at a high level, but
>> each
>> covers a different use case.
>>
>> * For Long, Similar data e.g. Text we can use Prefix Compression
>> We still store one pointer per row, but we reduce the size of the
>> index
>> by reducing the size of the key values. This requires us to reach
>> inside
>> datatypes, so isn't a very general solution but is probably an
>> important
>> one in the future for Text.

I think what would be even more useful is doing this within the table
itself, and then bubbling that up to the index.

>> * For Unique/nearly-Unique indexes we can use Range Compression
>> We reduce the size of the index by holding one index pointer per
>> range
>> of values, thus removing both keys and pointers. It's more efficient
>> than prefix compression and isn't datatype-dependant.

Definitely.

>> * For Highly Non-Unique Data we can use Duplicate Compression
>> The latter is the technique used by Bitmap Indexes. Efficient, but
>> not
>> useful for unique/nearly-unique data

Also definitely. This would be hugely useful for things like "status"
or "type" fields.

>> * Multi-Column Leading Value Compression - if you have a multi-column
>> index, then leading columns are usually duplicated between rows
>> inserted
>> at the same time. Using an on-block dictionary we can remove
>> duplicates.
>> Only useful for multi-column indexes, possibly overlapping/contained
>> subset of the GIT use case.

Also useful, though I generally try and put the most diverse values
first in indexes to increase the odds of them being used. Perhaps if
we had compression this would change.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-04-24 16:27:15 Re: [GENERAL] I think this is a BUG?
Previous Message Simon Riggs 2008-04-24 16:21:35 Re: Index AM change proposals, redux