Skip site navigation (1) Skip section navigation (2)

Re: questions about CLUSTER

From: Douglas J Hunley <doug(at)hunley(dot)homeip(dot)net>
To: Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: questions about CLUSTER
Date: 2008-02-27 18:35:16
Message-ID: 200802271335.16397.doug@hunley.homeip.net (view raw or flat)
Thread:
Lists: pgsql-performance
On Wednesday 27 February 2008 12:40:57 Bill Moran wrote:
> In response to Douglas J Hunley <doug(at)hunley(dot)homeip(dot)net>:
> > After reviewing
> > http://www.postgresql.org/docs/8.3/static/sql-cluster.html a couple of
> > times, I have some questions:
> > 1) it says to run analyze after doing a cluster. i'm assuming autovacuum
> > will take care of this? or should i go ahead and do the analyze 'now'
> > instead of waiting?

> It's always a good idea to analyze after major DB operations.  Autovacuum
> only runs so often.  Also, it won't hurt anything, so why risk not doing
> it?

being overly-cautious. i was concerned about both autovac and me doing 
analyzes over each other

>
> > 2) is there any internal data in the db that would allow me to
> > programmatically determine which tables would benefit from being
> > clustered? 3) for that matter, is there info to allow me to determine
> > which index it should be clustered on in cases where the table has more
> > than one index?
>
> The pg_stat_user_indexes table keeps stats on how often the index is used.
> Indexes that are used frequently are candidates for clustering.

I had just started looking at this actually.

>
> > 4) for tables with >1 indexes, does clustering on one index negatively
> > impact queries that use the other indexes?
>
> Not likely.  Clustering only really helps performance if you have an index
> that is used to gather ranges of data.  For example, if you frequently
> do things like SELECT * FROM log WHERE logdate > 'somedate" and <
> 'somedate, you might benefit from clustering on logdate.
>
> But it doesn't really do much if you're only ever pulling one record at a
> time.  It's the kind of thing that you really need to experiment with to
> understand whether it will have a worthwhile performance impact on your
> data and your workload.  I doubt if there's any pat answer.

makes sense.

>
> > 5) is it better to cluster on a compound index (index on
> > lastnamefirstname) or on the underlying index (index on lastname)?
>
> If cluster helps you at all, it's going to help if you have an index that's
> frequently used to fetch ranges of data.  Whether that index is compound or
> not isn't likely to factor in.

understood. i didn't really think it would matter, but its easier to ask than 
to screw up performance for existing customers :)

-- 
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

If a turtle doesn't have a shell, is he homeless or naked?

In response to

Responses

pgsql-performance by date

Next:From: Douglas J HunleyDate: 2008-02-27 18:45:11
Subject: Re: questions about CLUSTER
Previous:From: Bill MoranDate: 2008-02-27 17:40:57
Subject: Re: questions about CLUSTER

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group