Re: Updating a very large table

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Rafael Domiciano" <rafael(dot)domiciano(at)gmail(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Updating a very large table
Date: 2009-04-24 14:04:36
Message-ID: 49F180A4.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Rafael Domiciano <rafael(dot)domiciano(at)gmail(dot)com> wrote:

> this table has about 15 indexes...

That would tend to make mass updates like this slow.

> How good are to Cluster table? Has any criteria to cluster table?
> How can I do it?

CLUSTER reads through the table in the sequence of an index, which you
specify, and creates a new copy of the table and then replaces the
original table with this copy. The table then has no bloat and the
data rows will (until you start modifying the table) be in the same
sequence as that index.

You must have room for a second copy of the table in order for this to
succeed. All indexes, permissions, etc. are set to match the original
table. The only choice is which index to use -- if there is an index
which is often used to select a number of rows, it is a good candidate
for use in the CLUSTER, since that will minimize disk access.

As has already been pointed out, there are ways to do the same thing
with a sequential pass of the data. If there is no index which is
often used to select a number of rows, or the CLUSTER is unable to
complete in whatever maintenance window you have, the unordered
approach might be better than CLUSTER.

-Kevin

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message chen shan 2009-04-24 14:04:56 How to map columns in pg_stat_activity to windows PID
Previous Message Simon Riggs 2009-04-24 08:46:54 Re: postgres 8.2.9 can't drop database in single user mode