Quick Links

How to "unique-ify" HUGE table?

From:	"Kynn Jones" <kynnjo(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	How to "unique-ify" HUGE table?
Date:	2008-12-23 17:25:48
Message-ID:	c2350ba40812230925jb50fed7h3dc58cc311888c7f@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi everyone!
I have a very large 2-column table (about 500M records) from which I want to
remove duplicate records.

I have tried many approaches, but they all take forever.

The table's definition consists of two short TEXT columns. It is a
temporary table generated from a query:

CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;

Initially I tried

CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;

but after waiting for nearly an hour I aborted the query, and repeated it
after getting rid of the DISTINCT clause.

Everything takes forever with this monster! It's uncanny. Even printing it
out to a file takes forever, let alone creating an index for it.

Any words of wisdom on how to speed this up would be appreciated.

TIA!

Kynn

Re: How to "unique-ify" HUGE table? at 2008-12-23 17:34:28 from Scott Marlowe
Re: How to "unique-ify" HUGE table? at 2008-12-23 17:39:17 from D'Arcy J.M. Cain
Re: How to "unique-ify" HUGE table? at 2008-12-23 18:14:24 from George Pavlov

	From	Date	Subject
Next Message	Scott Marlowe	2008-12-23 17:34:28	Re: How to "unique-ify" HUGE table?
Previous Message	Alvaro Herrera	2008-12-23 12:01:06	Re: dbt-2 tuning results with postgresql-8.3.5