Quick Links

CLUSTERing on Insert

From:	CG <cgg007(at)yahoo(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	CLUSTERing on Insert
Date:	2006-09-18 05:21:27
Message-ID:	20060918052127.93788.qmail@web37907.mail.mud.yahoo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

As I'm waiting for a CLUSTER operation to finish, it occurs to me that in a lot of cases, the performance benefits to having one's data stored on disk in index order can outweigh the overhead involved in inserting data on-disk in index order.... Just an idea I thought I'd throw out. :)

Also, the CLUSTER operation is about as straight forward as one can get. It basically reads each row, one-by-one, in the index order over to the new table, reindexes, then renames the new table to preserve references. I've been thinking about how to speed up the copy process. Perhaps taking contiguous blocks of data and moving them into place would save some I/O time. Locking the table is another problem. Would it be impossible to perform the CLUSTER within the context of a READ COMMITTED transaction, and then pick up the leftover CRUD rows and put them at the end of the file. The existing code makes some assumptions that the table was not altered. There would be no more assumptions.

I'm sure I'm not the first person to scratch his head thinking about CLUSTER. Maybe I just don't really understand the limitations that are out there preventing these things from being created. But, what else is there to do at 1AM on a Sunday night waiting for a 500MB table to CLUSTER? :)

CG

Responses

Re: CLUSTERing on Insert at 2006-09-22 16:53:59 from Jim C. Nasby

Browse pgsql-general by date

	From	Date	Subject
Next Message	Sim Zacks	2006-09-18 06:30:48	Re: transaction confusion
Previous Message	Najib Abi Fadel	2006-09-18 05:19:44	What is the Best Postgresql Load Balancing Solution available ?