Re: Index/Function organized table layout

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: James Rogers <jamesr(at)best(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index/Function organized table layout
Date: 2003-10-03 06:34:30
Message-ID: 1065162869.2421.22.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

James Rogers kirjutas N, 02.10.2003 kell 23:44:
> On Thu, 2003-10-02 at 12:09, Hannu Krosing wrote:
> > So what you really need is the CLUSTER command to leave pages half-empty
> > and the tuple placement logic on inserts/updates to place new tuples
> > near the place where they would be placed by CLUSTER. I.e. the code that
> > does actual inserting should be aware of CLUSTERing.
>
>
> Not exactly. What you are describing is more akin to partitioning or
> hash-organized tables i.e. sorting insert/update tuples to various pages
> according to some hash function.

What I actually thought I was describing is how CLUSTER should work in a
postgres flavour of MVCC storage ;). Not the CLUSTER command, but the
whole feature.

> B-Tree organized tables basically make the table and the primary index
> the same thing, and tend to be most useful for tables that use a single
> multi-column primary key index for queries. This has the effect of
> putting all the tuples for a typical query in the same small number of
> heap pages (and therefore buffers), allowing very efficient access in
> the typical case with the caveat that non-indexed queries will be quite
> slow.

AFAICS we could resolve this problem (querying indexes only) by keeping
a copy of visibility info (tmin,tmax,...) in index tuples. This would
make index updates bigger and thus slower, so this should be optional.

If you then put all fields in primary key, then the main table could be
dropped. If there is no "data" table then no other indexes would then be
allowed, or they must be "double-indexes" referencing the primary key,
not tuple and thus even bigger ...

> B-Tree organized tables are particularly useful when the insert order is
> orthogonal to the typical query order. As I mentioned before, tables
> that store parallel collections of time-series data are classic
> examples. Often the data is inserted into the pages in order that can
> roughly be described as (timestamp, id), but is queried using (id,
> timestamp) as the index. If you have enough ids, you end up with the
> pathological case where you typically have one relevant tuple per page
> for a given query.

But if we had clustered the table on (id, timestamp), then the data
would be in right order for queries, if cluster worked well.

> The nuisance would be keeping track of which pages are collecting which
> tuples. Knowing the CLUSTER index doesn't tell you much about which
> pages would currently be a good place to put a new tuple. You could
> always markup the index that CLUSTER uses to keep track of good
> candidates (plus some additional structures), but the more I think about
> that, the more it looks like a nasty hack.

Yeah, index-organized tables seems exact fit for your problem, but then
my abstract idea of what clustering should do is exactly that - keep
tuples in roughly the same order as an index ;)

So what really is needed is a smart tuple-placer which can keep tuples
that are close (as defined by index) together in a small number of
pages. These pages themselves need not be coninuous, they can be
sprinkled around in the whole data table, but they need to stay clusters
of index-close tuples.

------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kris Jurka 2003-10-03 10:22:56 Re: Quick question
Previous Message Shridhar Daithankar 2003-10-03 06:29:02 Re: count(*) slow on large tables