Re: Deleting all but one row of a list of non-uniques

From: "Andrew Bartley" <abartley(at)evolvosystems(dot)com>
To: "Zak McGregor" <zak(at)mighty(dot)co(dot)za>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deleting all but one row of a list of non-uniques
Date: 2004-06-23 01:23:04
Message-ID: CAEAIPJHMAFOJGDANDACEEJBEMAA.abartley@evolvosystems.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Try this.

Create a temp table with a list of the duplicate unid's

eg

create temp table duplicates
as
select min(oid) as oid_val, unid from <table>
group by unid having count(*) > 1;

Then isolate the unwanted rows

update <table>
set unid = null <this could be any value you choose>
from duplicates
where <table>.unid = duplicates.unid
and oid_val <> <table>.oid

Then delete them

delete from <table> where unid is null

Thanks

Andrew

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org]On Behalf Of Zak McGregor
Sent: Wednesday, 23 June 2004 10:18 AM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] Deleting all but one row of a list of non-uniques

Hi all

I have a table, for simplicity's sake containing one field, called unid.

for example, select unid, oid from table gives me something like this:

unid | oid
---------+---------
XNO24ORK | 40276607
XNPGJDPU | 40276673 *
XNPGJDPU | 40276674 *
XNXAAVQ2 | 40277583
ZAZAFAA4 | 40276600
ZAZV5UG4 | 40276446
ZD66A1LL | 40277162
ZDXZ27RS | 40277454
ZEKRT3GM | 40277739 *
ZEKRT3GM | 40277740 *
ZEKRT3GM | 40277741 *

(I've marked the rows with duplicated unids with the * to the right)

I'd like to delete rows in such a way that one (and any one) row for each
unid
remains, and all other duplicates are removed. Does anyone have any ideas
that
may help me here please?

slecting distinct is not helpful as in reality there are other fields which
contain data like timestamps that will differ but I don't mind which one
remains, but I'd need to work with them on selects to the table afterwards.

I've tried:

delete from table where oid in (select p1.oid from table p1, table p2 where
p1.oid != p2.oid and p1.unid=p2.unid);

which only works in a few cases - I suspect where there are only two rows
with
the same unid. Is it even possible?

Alternatively, can I get the \copy command to ignore attempts to insert
duplicated rows into a UNIQUE column instead of aborting the copy? Not sure
if
any of the options that can be supplied to the table at creation time for
unique
will help here.

Thanks.

Ciao

Zak

--
========================================================================
http://www.carfolio.com/ Searchable database of 10 000+ car specs
========================================================================

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2004-06-23 01:49:33 Re: Connection gets into state where all queries fail
Previous Message Scott Ribe 2004-06-23 01:02:28 Re: Connection gets into state where all queries fail