Skip site navigation (1) Skip section navigation (2)

SQL for removing duplicates?

From: <kynn(at)panix(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: SQL for removing duplicates?
Date: 2006-06-13 16:11:41
Message-ID: 200606131611.k5DGBff21012@panix3.panix.com (view raw or flat)
Thread:
Lists: pgsql-novice


Hi.  I'm stumped.  I have a large table (about 8.5M records), let's
call it t, whose columns include x and y.  I want to remove records
from this table so that any pair of values for these two fields appear
only once.  (This will get rid of about 15% of the records in t.)

One simple solution would be something like

  CREATE TABLE tmp AS SELECT DISTINCT ON ( x, y ) * FROM t;
  DROP TABLE t;
  ALTER TABLE tmp RENAME TO t;

This works, but it uses a lot of space.  I would prefer to simply cull
the unwanted records from t, but I just can't figure out the SQL for
it.  Any help with it would be *much* appreciated.

Thanks!

kj


Responses

pgsql-novice by date

Next:From: Brad NicholsonDate: 2006-06-13 18:26:04
Subject: Re: SQL for removing duplicates?
Previous:From: Christoph FrickDate: 2006-06-13 15:29:13
Subject: order by desc - with zeros on top

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group