Quick Links

Re: Netflix Prize data

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc:	pg(at)mohawksoft(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Netflix Prize data
Date:	2006-10-05 08:35:01
Message-ID:	4524C3B5.8030206@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Mark Woodward wrote:
>
> I tried to cluster the data along a particular index but had to cancel it
> after 3 hours.

If the data is in random order, it's faster to do

SELECT * INTO foo_sorted FROM foo ORDER BY bar

then CREATE INDEX, than to run CLUSTER.

That's because CLUSTER does a full index scan of the table, which is
slower than a seqscan + sort if the table is not already clustered.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

	From	Date	Subject
Next Message	Zdenek Kotala	2006-10-05 09:01:45	Re: workaround for buggy strtod is not necessary
Previous Message	Aaron Bono	2006-10-05 04:39:40	Re: formatting intervals with to_char