Skip site navigation (1) Skip section navigation (2)

Re: Equivalent praxis to CLUSTERED INDEX?

From: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: mischa(dot)sandberg(at)telus(dot)net
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-26 18:14:32
Message-ID: 1093544071.349.106.camel@vulture.corp.neopolitan.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Tue, 2004-08-24 at 22:28, Mischa Sandberg wrote:
> I see that PG has a one-shot CLUSTER command, but doesn't support
> continuously-updated clustered indexes.
> 
> What I infer from newsgroup browsing is, such an index is impossible,
> given the MVCC versioning of records (happy to learn I'm wrong).


It is possible to have MVCC and ordered/indexed heaps, but it isn't
something you can just tack onto the currently supported types -- I
looked into this myself.  It would take substantial additional code
infrastructure to support it, basically an alternative heap system and
adding support for tables with odd properties to many parts of the
system.  Pretty non-trivial.

This is probably my #1 "I wish postgres had this feature" feature.  It
is a serious scalability enhancer for big systems and a pain to work
around not having.


> I'd be curious to know what other people, who've crossed this same
> bridge from MSSQL or Oracle or Sybase to PG, have devised,
> faced with the same kind of desired performance gain for retrieving
> blocks of rows with the same partial key.


The CLUSTER command is often virtually useless for precisely the kinds
of tables that need to be clustered.  My databases are on-line 24x7, and
the tables that are ideal candidates for clustering are in the range of
50-100 million rows. I can afford to lock these tables up for no more
than 5-10 minutes during off-peak in the hopes that no one notices, and
CLUSTER does not work remotely in the ballpark of that fast for tables
of that size.  People who can run CLUSTER in a cron job must either have
relatively small tables or regular large maintenance windows.


My solution, which may or may not work for you, was to write a table
partitioning system using the natural flexibility and programmability of
postgresql (e.g. table inheritance).  From this I automatically get a
roughly ordered heap according to the index I would cluster on, with
only slightly funky SQL access.  The end result works much better with
CLUSTER too, though CLUSTER is much less necessary at that point
because, at least for my particular purposes, the rows are mostly
ordered due to how the data was partitioned.

So there are ways to work around CLUSTER, but you'll have to be clever
and it will require tailoring the solution to your particular
requirements.


J. Andrew Rogers




In response to

Responses

pgsql-performance by date

Next:From: Matt ClarkDate: 2004-08-26 18:18:52
Subject: Re: Disabling transaction/outdated-tuple behaviour
Previous:From: Neil CooperDate: 2004-08-26 18:02:55
Subject: Disabling transaction/outdated-tuple behaviour

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group