Re: Removing duplicate entries

From: "Scott Ford" <Scott(dot)Ford(at)bullfrogpower(dot)com>
To: "Michael Fuhr" <mike(at)fuhr(dot)org>
Cc: <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Removing duplicate entries
Date: 2006-01-12 17:03:19
Message-ID: CAA10533F41BB7448F7659D4D067190742FB1A@cx41.800onemail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Right. Missed the *not*. It worked just fine on the test data. Thanks
again, Mike.

Scott.

-----Original Message-----
From: Michael Fuhr [mailto:mike(at)fuhr(dot)org]
Sent: January 12, 2006 12:00 PM
To: Scott Ford
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: [NOVICE] Removing duplicate entries

[Please copy the mailing list on replies.]

On Thu, Jan 12, 2006 at 10:24:10AM -0500, Scott Ford wrote:
> That is exactly the result that I'm looking for. But won't that
> actually remove the documents with doc_type_id where there is only one
> for a given customer too?

No, it shouldn't (see below).

> Maybe I need to read up on my GROUP BY clause.
>
> I thought that:
>
> SELECT min(doc_id)
> FROM doc
> WHERE doc_type_id = 2 -- not necessary but probably more efficient
> GROUP BY cust_id, doc_type_id
>
> would select one instance of doc_type_id = 2 from each customer if
there
> was one, and would select the one with the lowest doc_id if there was
> more then one.

Right. This subquery generates the list of doc_id's *not* to delete
(you could use max instead of min; the point is to choose exactly
one out of the set). The WHERE clause in the subquery isn't necessary
because the DELETE already restricts the affected records to those
with doc_type_id 2; however, adding the restriction to the subquery
generates a smaller list of doc_id's to check against, which might
make the delete faster. In other words, the delete statement says
"delete all records with doc_type_id 2 except for these", where
"these" is the list generated by the subquery. "These" can contain
but doesn't need to contain doc_id's for documents with a doc_type_id
other than 2 because we've already said they're not eligible for
deletion.

> So then wouldn't the row:
>
> doc_id | cust_id | doc_type_id
> --------+---------+-------------
> 9 | 2 | 2
>
> get removed too?

No, as the example I posted shows (the output came from sample data
and actually running the queries shown).

> I'll test this out some test data in the meant time.

Please do: it's important that you understand how the delete works
and be satisfied that it *does* work. As I mentioned before, I
might be misunderstanding your requirements or making unwarranted
assumptions about your data; ultimately it's up to you to determine
whether the delete works or not.

--
Michael Fuhr

Browse pgsql-novice by date

  From Date Subject
Next Message operationsengineer1 2006-01-12 17:08:59 Sort of Complex Query - Howto Eliminate Repeating Results
Previous Message Michael Fuhr 2006-01-12 16:59:33 Re: Removing duplicate entries