From: | "Kynn Jones" <kynnjo(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | How to "unique-ify" HUGE table? |
Date: | 2008-12-23 17:25:48 |
Message-ID: | c2350ba40812230925jb50fed7h3dc58cc311888c7f@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi everyone!
I have a very large 2-column table (about 500M records) from which I want to
remove duplicate records.
I have tried many approaches, but they all take forever.
The table's definition consists of two short TEXT columns. It is a
temporary table generated from a query:
CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;
Initially I tried
CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;
but after waiting for nearly an hour I aborted the query, and repeated it
after getting rid of the DISTINCT clause.
Everything takes forever with this monster! It's uncanny. Even printing it
out to a file takes forever, let alone creating an index for it.
Any words of wisdom on how to speed this up would be appreciated.
TIA!
Kynn
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2008-12-23 17:34:28 | Re: How to "unique-ify" HUGE table? |
Previous Message | Alvaro Herrera | 2008-12-23 12:01:06 | Re: dbt-2 tuning results with postgresql-8.3.5 |