Re: CLUSTER and synchronized scans and pg_dump et al

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CLUSTER and synchronized scans and pg_dump et al
Date: 2008-01-28 17:03:42
Message-ID: 479DB68E.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> On Mon, Jan 28, 2008 at 10:36 AM, in message <3001(dot)1201538162(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> in general pg_dump's charter is to reproduce
> the state of the database as best it can, not to "improve" it.

Seems that I've often seen it recommended as a way to eliminate bloat.

It seems like there are some practical use cases where it would be
a pain to have to do a CLUSTER right on the heels of having used
pg_dump to psql.

This does seem like the right way to do it where a user really wants
to maintain the physical sequence; my biggest concern is that
CLUSTER is sometimes used to eliminate bloat, and there is no real
interest in maintaining that sequence later. I'd bet that people
generally do not alter the table to remove the clustered index
choice, so this option could be rather painful somewhere
downstream, when the sequence has become pretty random.

Maybe it would make sense if it was not the default, and the issues
were properly documented under the description of the option?

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-01-28 17:14:39 Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Previous Message Steve Atkins 2008-01-28 16:53:58 Re: CLUSTER and synchronized scans and pg_dump et al