Quick Links

Re: SQL for removing duplicates?

From:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
To:	kynn(at)panix(dot)com
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: SQL for removing duplicates?
Date:	2006-06-13 18:26:04
Message-ID:	448F033C.1020708@ca.afilias.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

kynn(at)panix(dot)com wrote:
> Hi. I'm stumped. I have a large table (about 8.5M records), let's
> call it t, whose columns include x and y. I want to remove records
> from this table so that any pair of values for these two fields appear
> only once. (This will get rid of about 15% of the records in t.)
>
> One simple solution would be something like
>
> CREATE TABLE tmp AS SELECT DISTINCT ON ( x, y ) * FROM t;
> DROP TABLE t;
> ALTER TABLE tmp RENAME TO t;
>
> This works, but it uses a lot of space. I would prefer to simply cull
> the unwanted records from t, but I just can't figure out the SQL for
> it. Any help with it would be *much* appreciated.

If your table is created with OIDs, this should work. If not add a
unique column to the table and use that in place of oid.

DELETE FROM t where oid IN (select t2.oid from t t2 EXCEPT SELECT
max(t3.oid) from t t3 group by t3.x, t3.y);

Also note, the query plan for this is going to be very ugly, it might
very well be cheaper to use the solution that you initially mentioned.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.

In response to

SQL for removing duplicates? at 2006-06-13 16:11:41 from kynn

Responses

Re: SQL for removing duplicates? at 2006-06-13 22:22:37 from kynn

Browse pgsql-novice by date

	From	Date	Subject
Next Message	kynn	2006-06-13 22:22:37	Re: SQL for removing duplicates?
Previous Message	kynn	2006-06-13 16:11:41	SQL for removing duplicates?