Skip site navigation (1) Skip section navigation (2)

Re: Equivalent praxis to CLUSTERED INDEX?

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, mischa(dot)sandberg(at)telus(dot)net
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-26 18:18:53
Message-ID: 200408261818.i7QIIrr09640@candle.pha.pa.us (view raw or flat)
Thread:
Lists: pgsql-performance
How do vendors actually implement auto-clustering?  I assume they move
rows around during quiet periods or have lots of empty space in each
value bucket.

---------------------------------------------------------------------------

J. Andrew Rogers wrote:
> On Tue, 2004-08-24 at 22:28, Mischa Sandberg wrote:
> > I see that PG has a one-shot CLUSTER command, but doesn't support
> > continuously-updated clustered indexes.
> > 
> > What I infer from newsgroup browsing is, such an index is impossible,
> > given the MVCC versioning of records (happy to learn I'm wrong).
> 
> 
> It is possible to have MVCC and ordered/indexed heaps, but it isn't
> something you can just tack onto the currently supported types -- I
> looked into this myself.  It would take substantial additional code
> infrastructure to support it, basically an alternative heap system and
> adding support for tables with odd properties to many parts of the
> system.  Pretty non-trivial.
> 
> This is probably my #1 "I wish postgres had this feature" feature.  It
> is a serious scalability enhancer for big systems and a pain to work
> around not having.
> 
> 
> > I'd be curious to know what other people, who've crossed this same
> > bridge from MSSQL or Oracle or Sybase to PG, have devised,
> > faced with the same kind of desired performance gain for retrieving
> > blocks of rows with the same partial key.
> 
> 
> The CLUSTER command is often virtually useless for precisely the kinds
> of tables that need to be clustered.  My databases are on-line 24x7, and
> the tables that are ideal candidates for clustering are in the range of
> 50-100 million rows. I can afford to lock these tables up for no more
> than 5-10 minutes during off-peak in the hopes that no one notices, and
> CLUSTER does not work remotely in the ballpark of that fast for tables
> of that size.  People who can run CLUSTER in a cron job must either have
> relatively small tables or regular large maintenance windows.
> 
> 
> My solution, which may or may not work for you, was to write a table
> partitioning system using the natural flexibility and programmability of
> postgresql (e.g. table inheritance).  From this I automatically get a
> roughly ordered heap according to the index I would cluster on, with
> only slightly funky SQL access.  The end result works much better with
> CLUSTER too, though CLUSTER is much less necessary at that point
> because, at least for my particular purposes, the rows are mostly
> ordered due to how the data was partitioned.
> 
> So there are ways to work around CLUSTER, but you'll have to be clever
> and it will require tailoring the solution to your particular
> requirements.
> 
> 
> J. Andrew Rogers
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

In response to

Responses

pgsql-performance by date

Next:From: Josh BerkusDate: 2004-08-26 18:20:20
Subject: Re: Disabling transaction/outdated-tuple behaviour
Previous:From: Matt ClarkDate: 2004-08-26 18:18:52
Subject: Re: Disabling transaction/outdated-tuple behaviour

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group