Re: How to "unique-ify" HUGE table?

From: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To: "Kynn Jones" <kynnjo(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to "unique-ify" HUGE table?
Date: 2008-12-23 17:39:17
Message-ID: 20081223123917.df2f8c0e.darcy@druid.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 23 Dec 2008 12:25:48 -0500
"Kynn Jones" <kynnjo(at)gmail(dot)com> wrote:
> Hi everyone!
> I have a very large 2-column table (about 500M records) from which I want to
> remove duplicate records.
>
> I have tried many approaches, but they all take forever.
>
> The table's definition consists of two short TEXT columns. It is a
> temporary table generated from a query:
>
> CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;
>
> Initially I tried
>
> CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;
>
> but after waiting for nearly an hour I aborted the query, and repeated it

Do you have an index on x and y? Also, does this work better?

CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... GROUP BY x, y;

What does ANALYZE EXPLAIN have to say?

--
D'Arcy J.M. Cain <darcy(at)druid(dot)net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message George Pavlov 2008-12-23 18:14:24 Re: How to "unique-ify" HUGE table?
Previous Message Scott Marlowe 2008-12-23 17:34:28 Re: How to "unique-ify" HUGE table?