Re: Netflix Prize data

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc: pg(at)mohawksoft(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Netflix Prize data
Date: 2006-10-05 08:35:01
Message-ID: 4524C3B5.8030206@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mark Woodward wrote:
>
> I tried to cluster the data along a particular index but had to cancel it
> after 3 hours.

If the data is in random order, it's faster to do

SELECT * INTO foo_sorted FROM foo ORDER BY bar

then CREATE INDEX, than to run CLUSTER.

That's because CLUSTER does a full index scan of the table, which is
slower than a seqscan + sort if the table is not already clustered.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zdenek Kotala 2006-10-05 09:01:45 Re: workaround for buggy strtod is not necessary
Previous Message Aaron Bono 2006-10-05 04:39:40 Re: formatting intervals with to_char