Skip site navigation (1) Skip section navigation (2)

Re: How to "unique-ify" HUGE table?

From: "George Pavlov" <gpavlov(at)mynewplace(dot)com>
To: "Kynn Jones" <kynnjo(at)gmail(dot)com>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: How to "unique-ify" HUGE table?
Date: 2008-12-23 18:14:24
Message-ID: 8C5B026B51B6854CBE88121DBF097A86033B2DED@ehost010-33.exch010.intermedia.net (view raw or flat)
Thread:
Lists: pgsql-performance
You don't say what PG version you are on, but just for kicks you may try
using GROUP BY instead of DISTINCT. Yes, the two should perform the
same, but with 8.1 (or maybe 8.0) I had seen situations where GROUP BY
was faster (admittedly this happened with more complex queries). So, try
this:

  CREATE TEMP TABLE huge_table AS SELECT x, y FROM foo GROUP BY 1, 2;

Note that you may be tempted to add an index on foo(x,y), but I don't
think that helps (or at least I have not been able to hit the index in
similar situations).


> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-
> owner(at)postgresql(dot)org] On Behalf Of Kynn Jones
> Sent: Tuesday, December 23, 2008 9:26 AM
> To: pgsql-performance(at)postgresql(dot)org
> Subject: [PERFORM] How to "unique-ify" HUGE table?
> 
> Hi everyone!
> 
> I have a very large 2-column table (about 500M records) from which I
> want to remove duplicate records.
> 
> I have tried many approaches, but they all take forever.
> 
> The table's definition consists of two short TEXT columns.  It is a
> temporary table generated from a query:
> 
> 
> CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;
> 
> Initially I tried
> 
> CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;
> 
> but after waiting for nearly an hour I aborted the query, and repeated
> it after getting rid of the DISTINCT clause.
> 
> Everything takes forever with this monster!  It's uncanny.  Even
> printing it out to a file takes forever, let alone creating an index
> for it.
> 
> Any words of wisdom on how to speed this up would be appreciated.
> 
> TIA!
> 
> Kynn
> 
> 


In response to

Responses

pgsql-performance by date

Next:From: Scott MarloweDate: 2008-12-23 18:37:07
Subject: Re: How to "unique-ify" HUGE table?
Previous:From: D'Arcy J.M. CainDate: 2008-12-23 17:39:17
Subject: Re: How to "unique-ify" HUGE table?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group