Re: questions about CLUSTER

From: Douglas J Hunley <doug(at)hunley(dot)homeip(dot)net>
To: Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: questions about CLUSTER
Date: 2008-02-27 18:35:16
Message-ID: 200802271335.16397.doug@hunley.homeip.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wednesday 27 February 2008 12:40:57 Bill Moran wrote:
> In response to Douglas J Hunley <doug(at)hunley(dot)homeip(dot)net>:
> > After reviewing
> > http://www.postgresql.org/docs/8.3/static/sql-cluster.html a couple of
> > times, I have some questions:
> > 1) it says to run analyze after doing a cluster. i'm assuming autovacuum
> > will take care of this? or should i go ahead and do the analyze 'now'
> > instead of waiting?

> It's always a good idea to analyze after major DB operations. Autovacuum
> only runs so often. Also, it won't hurt anything, so why risk not doing
> it?

being overly-cautious. i was concerned about both autovac and me doing
analyzes over each other

>
> > 2) is there any internal data in the db that would allow me to
> > programmatically determine which tables would benefit from being
> > clustered? 3) for that matter, is there info to allow me to determine
> > which index it should be clustered on in cases where the table has more
> > than one index?
>
> The pg_stat_user_indexes table keeps stats on how often the index is used.
> Indexes that are used frequently are candidates for clustering.

I had just started looking at this actually.

>
> > 4) for tables with >1 indexes, does clustering on one index negatively
> > impact queries that use the other indexes?
>
> Not likely. Clustering only really helps performance if you have an index
> that is used to gather ranges of data. For example, if you frequently
> do things like SELECT * FROM log WHERE logdate > 'somedate" and <
> 'somedate, you might benefit from clustering on logdate.
>
> But it doesn't really do much if you're only ever pulling one record at a
> time. It's the kind of thing that you really need to experiment with to
> understand whether it will have a worthwhile performance impact on your
> data and your workload. I doubt if there's any pat answer.

makes sense.

>
> > 5) is it better to cluster on a compound index (index on
> > lastnamefirstname) or on the underlying index (index on lastname)?
>
> If cluster helps you at all, it's going to help if you have an index that's
> frequently used to fetch ranges of data. Whether that index is compound or
> not isn't likely to factor in.

understood. i didn't really think it would matter, but its easier to ask than
to screw up performance for existing customers :)

--
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

If a turtle doesn't have a shell, is he homeless or naked?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Douglas J Hunley 2008-02-27 18:45:11 Re: questions about CLUSTER
Previous Message Bill Moran 2008-02-27 17:40:57 Re: questions about CLUSTER