Re: Removing duplicate entries

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Scott Ford <Scott(dot)Ford(at)bullfrogpower(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Removing duplicate entries
Date: 2006-01-12 04:48:30
Message-ID: 20060112044830.GA90783@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Wed, Jan 11, 2006 at 04:21:45PM -0500, Scott Ford wrote:
> customer
> customer_id (pk)
> ...
>
> documentation
> documentation_id (pk)
> customer_id (fk)
> document_type_id (fk)
> accepted
>
> document_types
> document_type_id (pk)
> document_name
> ...
>
> What I want to do is remove duplicate entries for the same
> customer_id/document_type_id for only a certain document_type_id.

Will a modified version of my earlier query work? For example,
suppose I start with this data:

SELECT * FROM doc ORDER BY cust_id, doc_type_id, doc_id;
doc_id | cust_id | doc_type_id
--------+---------+-------------
1 | 1 | 1
2 | 1 | 1
3 | 1 | 2
4 | 1 | 2
5 | 1 | 2
6 | 1 | 3
7 | 1 | 3
8 | 2 | 1
9 | 2 | 2
10 | 2 | 3
(10 rows)

Then I run this delete to remove duplicates for doc_type_id 2:

DELETE FROM doc
WHERE doc_type_id = 2 AND doc_id NOT IN (
SELECT min(doc_id)
FROM doc
WHERE doc_type_id = 2 -- not necessary but probably more efficient
GROUP BY cust_id, doc_type_id
);

I end up with this:

SELECT * FROM doc ORDER BY cust_id, doc_type_id, doc_id;
doc_id | cust_id | doc_type_id
--------+---------+-------------
1 | 1 | 1
2 | 1 | 1
3 | 1 | 2
6 | 1 | 3
7 | 1 | 3
8 | 2 | 1
9 | 2 | 2
10 | 2 | 3
(8 rows)

cust_id 1's duplicates for doc_type_id 2 have been removed (doc_id
4 and 5) but cust_id 1's duplicates for doc_type_id 1 and 3 remain.
cust_id 2 had no duplicates; all of its records remain.

Is that what you're looking for? If not then please post some
sample data and describe exactly which records you want to delete.
It would be helpful if you post the example as CREATE TABLE and
INSERT statements that people can load into their own database.

--
Michael Fuhr

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Brendan Duddridge 2006-01-12 04:50:07 Re: Comparing databases
Previous Message Tom Lane 2006-01-12 04:42:18 Re: Sequential Scans On Complex Query With UNION - see why this fails