Re: A thought on Index Organized Tables

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>, "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A thought on Index Organized Tables
Date: 2010-02-23 15:18:02
Message-ID: 4B839D4B020000250002F4F5@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> On Mon, 2010-02-22 at 08:51 +0200, Heikki Linnakangas wrote:
>> Gokulakannan Somasundaram wrote:
>
>> > May i get a little clarification on this issue? Will we be
>> > supporting the IOT feature in postgres in future?
>>
>> What seems like the best path to achieve the kind of performance
>> benefits that IOTs offer is allowing index-only-scans using the
>> visibility map.
>
> I don't agree with that. Could you explain why you think that
> would be the case? It would be a shame to have everybody think you
> can solve a problem if it turned out not to be the case.

I'd like to be clear on what feature we're discussing. There has
been mention of an organization where there is no heap per se, but
all columns are stored in the leaf node of one of the table's
indexes (which is the structure referred to as a CLUSTERED INDEX in
some other popular products). There has been some mention of
storing some of the data out-of-line, which could be considered to
be already covered by TOAST. I know that one of the things which
makes this technique particularly effective with such things as name
columns for a clustered index is that these other products store
index entries after the first in a page with a length that matches
the previous entry and the differing data at the tail, which we
don't yet have.

Clearly it's not trivial, but there are certainly cases where it can
be a big performance win. Besides the obvious issues around having
a relation which functions like both an index and a heap (at the
leaf level), there are the details of having other indexes point to
these leaf nodes, creating and dropping clustered indexes, impact on
vacuums, etc.

Situations where clustered indexes tended to help:

(1) Most access through a particular index -- often one less random
read per access.

(2) Frequent sequential access through a range of values in an
index -- turn random access into mostly sequential.

(3) Index values comprise a large portion of each tuple -- avoid
redundant storage, reducing disk footprint, thereby improving cache
hits.

Points 1 and 2 could be covered to some degree by index-only scans,
particularly if additional columns are added to indexes to make them
"covering indexes". Index-only scans don't help with 3 at all; in
fact, adding the additional columns to indexes to allow that
optimization tends to work against it.

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-02-23 15:28:06 Re: A thought on Index Organized Tables
Previous Message Heikki Linnakangas 2010-02-23 15:11:24 Re: A thought on Index Organized Tables