Re: Re: Loading optimization

From: kleptog(at)cupid(dot)suninternet(dot)com (Martijn van Oosterhout)
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Ian Harding <iharding(at)pakrat(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Re: Loading optimization
Date: 2001-01-10 04:20:24
Message-ID: 20010110152024.B12844@cupid.suninternet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Jan 09, 2001 at 10:51:35AM -0500, Bruce Momjian wrote:
> Well, clustering certainly speeds up index access to multiple heap
> values because duplicate values are all on the same heap page. One
> thing that is missing is that there is no preference for index scans for
> clustered indexes.

Maybe that would be the simple way, just a flag. Alternatively, have VACUUM
ANALYZE estimate the "cohesiveness" of the data...

> Because the clustering is not permanent, but becomes unclustered as data
> is added/modified, there is no easy way to know if the clustering is
> still valid.

Well, in our case the table has over 1,000,000 rows and refer to items that
would appear on a bill. Since a bill is never changed after the fact, the
clustering is always in effect.

This table is a WORM table, once data is added, it is never updated. The
question is, is this typical of very large tables? If that is the case then
generally clustering would tend to stay rather than degrade.

Also, in our case, clustering by a single index is not really sufficient.
Within a single bill we would like to cluster the items by service. I was
thinking or writing a program that would do a pg_dump, order by columns as
requested and then dump it back in. I would've done it except that
Postgresql won't use the fact that it's sorted.

I for one am hoping for progress in this area. Unnessesary sequential scans
are painful when someone is waiting on the phone...

Martijn

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2001-01-10 04:36:20 Re: Re: COPY error: pqReadData() -- backend closed the channel unexpectedly
Previous Message Tatsuo Ishii 2001-01-10 02:11:11 Re: Re: starting PGSQL automatically on Redhat 6.2