Re: CLUSTER all tables

From: Alvaro Herrera <alvherre(at)atentus(dot)com>
To: Alvaro Herrera <alvherre(at)atentus(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: CLUSTER all tables
Date: 2002-09-01 05:30:21
Message-ID: 20020901013021.738fd248.alvherre@atentus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

En Fri, 30 Aug 2002 19:02:47 -0400 (CLT)
Alvaro Herrera <alvherre(at)atentus(dot)com> escribió:

> What it does:
>
> - if CLUSTER is called with no arguments, cluster all indexes that
> have indisclustered set (in the current database). There's no "ALL"
> argument: that's just pollution IMHO.
>
> - Gets a list of such indexes (checking ownership of each) and passes
> them one by one to the standard cluster routine (modified a little
> so it accepts OIDs of table and index, not names).

The same as before, but I do include documentation patch (feel free to
rewrite or suggest improvements) and regression test.

Now I'm thinking about concurrency: suppose table A and B have
indisclustered set on indexes ind_a and ind_b respectively. The
user fires a CLUSTER without arguments; the backend will begin
clustering table A on ind_a.

Now, while this is going on, the user fires a CLUSTER on table B on
index ind_b_2, on a separate session. This table is shorter than table
A and finishes first.

When the first cluster finishes clustering table A, it will start
clustering table B on ind_b. This is because the cluster-all creates a
list of the tables to be clustered, and _then_ it clusters them one by
one. So the info saved about table B is old and overrides the new
cluster that the user has done on another session.

The question is: is this a situation worth to protect against? and what
is the best way to do it? I can see two ways:

1. allow only one cluster operation at the same time, with some kind of
lock (can the lightweight lock manager be used for this?)

2. generate the list of tables as it goes. This requires keeping
pg_index open (with AccessShareLock) for a potentially long time (I
don't need to tell you that cluster can be *slow*). Is this
acceptable?

Is there another?

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"Granting software the freedom to evolve guarantees only different
results, not better ones." (Zygo Blaxell)

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2002-09-01 06:29:25 Re: CLUSTER all tables
Previous Message Vishnu 2002-09-01 04:55:41 syntax to access/retrieve data from multiple databases