Re: Index AM change proposals, redux

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index AM change proposals, redux
Date: 2008-04-24 11:52:30
Message-ID: 1209037950.4259.1522.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2008-04-24 at 13:13 +0200, Martijn van Oosterhout wrote:
> On Thu, Apr 24, 2008 at 10:11:02AM +0100, Simon Riggs wrote:
> > Index compression is possible in many ways, depending upon the
> > situation. All of the following sound similar at a high level, but each
> > covers a different use case.
>
> True, but there is one significant difference:

> > * For Long, Similar data e.g. Text we can use Prefix Compression
> > * For Highly Non-Unique Data we can use Duplicate Compression
> > * Multi-Column Leading Value Compression - if you have a multi-column
>
> These are all not lossy and so are candidate to use on any b-tree even
> by default. They don't affect plan-construction materially, except
> perhaps in cost calculations. Given the index tuple overhead I don't
> see how you could lose.

True, but it is important that this is not a discussion about the one
best technique for index compression. I think there are many techniques,
covering many use cases. If we think there is only one, the different
use cases get thrown out the door because "there was no consensus". If
the TODO list does not say clearly what we want, it will never happen.
We know no sponsor will pay for an idea that didn't make the list.
Worse, they will pay for what is on the list, even if it wasn't exactly
what we wanted.

> > * For Unique/nearly-Unique indexes we can use Range Compression
>
> This one is lossy and so does affect possible plans. I beleive the term
> for this is "sparse" index. Also, it's seems harder to optimise, since
> it would seem to me the index would have to have some idea on how
> "close" values are together.

Value Lossy, true, but we already support lossy searches with
BitmapHeapScans, so lets not be fooled by how that word sounds. That was
Tom's original complaint against Block Indexes in Sep 2006, which is why
the non-lossy GIT design was conceived. GIT knows exactly which tuples
are indexed, and which are not.

There is no need for the index to know how close the values are. Only
that values in that range live on that block.

> > As with HOT, all of these techniques need to be controlled by
> > heuristics. Taken to extremes, these techniques can hurt other aspects
> > of performance, so its important we don't just write the code but also
> > investigate reasonable limits for behaviour also.
>
> For the first three I can't imagine what the costs would be, except the
> memory usage to store the unduplicated data once when its read in.
>
> Have a nice day,
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-04-24 12:05:35 Re: Proposed patch - psql wraps at window width
Previous Message Simon Riggs 2008-04-24 11:28:21 Re: Batch update of indexes on data loading