Re: identifying duplicates in table with redundancies

From: Oliver d'Azevedo Christina <oliveiros(dot)cristina(at)gmail(dot)com>
To: Tarlika Elisabeth Schmitz <postgresql(at)numerixtechnology(dot)de>
Cc: "<pgsql-sql(at)postgresql(dot)org>" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: identifying duplicates in table with redundancies
Date: 2010-09-24 17:12:18
Message-ID: 065CBEA1-1647-4233-BBB6-48E879FD91B0@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hey,Tarlika.
I tried to reproduce your test case through a series of inserts and It
seems that the lower case "d" went unnoticed.
That explains the empty list i got as result. My fault. Sorry :-(
Great to hear it helped you

Best,
Oliveiros

Enviado de meu iPhone

Em 24/09/2010, às 05:12 PM, "Tarlika Elisabeth Schmitz" <postgresql(at)numerixtechnology(dot)d
e> escreveu:

> Dear Oliveiros,
> Thank you for taking the time to help.
>
> On Fri, 24 Sep 2010 11:22:21 +0100
> "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt> wrote:
>
>> ----- Original Message -----
>> From: "Tarlika Elisabeth Schmitz" <postgresql(at)numerixtechnology(dot)de>
>> To: <pgsql-sql(at)postgresql(dot)org>
>> Sent: Thursday, September 23, 2010 10:39 PM
>> Subject: [SQL] identifying duplicates in table with redundancies
>>
>>
>>> [...] I want to check for duplicates:
>>>
>>> 1) multiples trainer names for same trainer id
>>> 2) multiple trainer ids for same trainer name
>>>
>>> I cobbled together the SQL and it does the job but it seems rather
>>> convoluted. I would like to know how I can improve it.
>>>
>>> CREATE TABLE student (
>>> id INTEGER NOT NULL,
>>> name VARCHAR(256) NOT NULL,
>>> trainer_id INTEGER,
>>> trainer_name VARCHAR(256),
>>> );
>>>
>>> ====
>>> EXAMPLE DATA
>>>
>>> 22 John 1 Macdonald
>>> 23 Jane 1 MacDonald
>>> 24 Paul 1 MacDonald
>>> 25 Dick 2 Smith
>>> 26 Bill 3 Smith
>>> 27 Kate 3 Smith
>>> ====
>>>
>>> select trainer_id, trainer_name from
>>> (
>>> select distinct on (trainer_name) trainer_id, trainer_name
>>> from student
>>> where trainer_id in
>>> (
>>> select distinct on (id) id
>>> from
>>> (
>>> select distinct on (trainer_id,trainer_name)
>>> trainer_id as id,
>>> trainer_name as name from student
>>> ) as trainer
>>> group by trainer.id
>>> having count (trainer.name) > 1
>>> )
>>> ) as y
>>> order by trainer_id
>
>
>> Howdy, Tarlika.
>>
>> First, did you past correctly your query into your mail?
>> I am asking this because your query doesn't seem work for me, it
>> returns an empty list :-|
>>
>> Your most nested query, [...]
>> returns this
>>
>> 1|"MacDonald"
>> 2|"Smith"
>> 3|"Smith"
>
>
> For me, the innermost query returns:
> 1|"Macdonald"
> 1|"MacDonald"
> 2|"Smith"
> 3|"Smith"
>
> (note the lower/uppercase "d" in MacDonald)
>
> The whole query returns:
> 1|"Macdonald"
> 1|"MacDonald"
>
>
>>> 1) multiples trainer names for same trainer id
>>> 2) multiple trainer ids for same trainer name
>>
>> To achieve 2) I would use this query
>>
>> SELECT DISTINCT trainer_id,trainer_name
>> FROM (
>> SELECT trainer_name -- The field you want to test for duplicates
>> FROM (
>> SELECT DISTINCT "trainer_id","trainer_name"
>> FROM student
>> ) x
>> GROUP BY "trainer_name" -- the field you want to test for
>> duplicates
>> HAVING (COUNT(*) > 1)
>> ) z
>> NATURAL JOIN student y
>
> I see my 2 innermost queries are the same as yours, just a bit more
> wordy. I messed up at the third query, which threw up an error when I
> tried to add an ORDER BY.
>
>> It will give you a list of the trainer names who have more than one
>> trainer ID and the respective trainer IDS.
>> For your particular example data result will be
>> 2|"Smith"
>> 3|"Smith"
>
> Splendid! Just what I wanted.
>
>> As Smith is the only trainer with records with diferent trainer IDs.
>> Question : Can this be what you want?
>
> The real table has 250000 entries and quite a few dups.
>
>> If you want to achieve 1) just substitute the trainer_name by
>> trainer_id on the commented places.
>
> 1) works as well now - just had to transpose id/name.
>
>
>
>
>
> --
>
> Best Regards,
> Tarlika Elisabeth Schmitz
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tarlika Elisabeth Schmitz 2010-09-24 17:39:36 Re: identifying duplicates in table with redundancies
Previous Message Tarlika Elisabeth Schmitz 2010-09-24 16:12:26 Re: identifying duplicates in table with redundancies