Quick Links

Re: Equivalent praxis to CLUSTERED INDEX?

From:	"J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Cc:	mischa(dot)sandberg(at)telus(dot)net
Subject:	Re: Equivalent praxis to CLUSTERED INDEX?
Date:	2004-08-26 18:14:32
Message-ID:	1093544071.349.106.camel@vulture.corp.neopolitan.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, 2004-08-24 at 22:28, Mischa Sandberg wrote:
> I see that PG has a one-shot CLUSTER command, but doesn't support
> continuously-updated clustered indexes.
>
> What I infer from newsgroup browsing is, such an index is impossible,
> given the MVCC versioning of records (happy to learn I'm wrong).

It is possible to have MVCC and ordered/indexed heaps, but it isn't
something you can just tack onto the currently supported types -- I
looked into this myself. It would take substantial additional code
infrastructure to support it, basically an alternative heap system and
adding support for tables with odd properties to many parts of the
system. Pretty non-trivial.

This is probably my #1 "I wish postgres had this feature" feature. It
is a serious scalability enhancer for big systems and a pain to work
around not having.

> I'd be curious to know what other people, who've crossed this same
> bridge from MSSQL or Oracle or Sybase to PG, have devised,
> faced with the same kind of desired performance gain for retrieving
> blocks of rows with the same partial key.

The CLUSTER command is often virtually useless for precisely the kinds
of tables that need to be clustered. My databases are on-line 24x7, and
the tables that are ideal candidates for clustering are in the range of
50-100 million rows. I can afford to lock these tables up for no more
than 5-10 minutes during off-peak in the hopes that no one notices, and
CLUSTER does not work remotely in the ballpark of that fast for tables
of that size. People who can run CLUSTER in a cron job must either have
relatively small tables or regular large maintenance windows.

My solution, which may or may not work for you, was to write a table
partitioning system using the natural flexibility and programmability of
postgresql (e.g. table inheritance). From this I automatically get a
roughly ordered heap according to the index I would cluster on, with
only slightly funky SQL access. The end result works much better with
CLUSTER too, though CLUSTER is much less necessary at that point
because, at least for my particular purposes, the rows are mostly
ordered due to how the data was partitioned.

So there are ways to work around CLUSTER, but you'll have to be clever
and it will require tailoring the solution to your particular
requirements.

J. Andrew Rogers

In response to

Equivalent praxis to CLUSTERED INDEX? at 2004-08-25 05:28:42 from Mischa Sandberg

Responses

Re: Equivalent praxis to CLUSTERED INDEX? at 2004-08-26 18:18:53 from Bruce Momjian

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Matt Clark	2004-08-26 18:18:52	Re: Disabling transaction/outdated-tuple behaviour
Previous Message	Neil Cooper	2004-08-26 18:02:55	Disabling transaction/outdated-tuple behaviour