Re: identifying duplicates in table with redundancies

From: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
To: "Tarlika Elisabeth Schmitz" <postgresql(at)numerixtechnology(dot)de>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: identifying duplicates in table with redundancies
Date: 2010-09-28 10:34:31
Message-ID: 5F1C28C4D61848D6A012C5DD84B22226@marktestcr.marktest.pt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello again,
Tarlika.

In what concerns to indices, I 'm affraid I may not be the best person to
advise you, my knowledge of them hardly goes beyond the most trivial cases.

I'm sure there are plenty of other people on the list who are able to give
you better advise than me.
But, on this query in particular I would recomend an indice on trainer_name,
as this field will be used on the join and on the group by.
For the other query, the one you get by substituting trainer_name by
trainer_id, place an index on trainer_id.
Also, these indexes may help speed up the order by clause, if you use one.

If you have a table with lots of data you can try them around and see how
performance varies (and don't forget there's also EXPLAIN ANALYZE)

Bear in mind, though, that this is just my 2 cents on a matter that I don't
really master. Do not take this as an expert answer.

There are many people on this list that can help you better

Best,
Oliveiros

----- Original Message -----
From: "Tarlika Elisabeth Schmitz" <postgresql(at)numerixtechnology(dot)de>
To: <pgsql-sql(at)postgresql(dot)org>
Sent: Monday, September 27, 2010 5:54 PM
Subject: Re: [SQL] identifying duplicates in table with redundancies

> On Fri, 24 Sep 2010 18:12:18 +0100
> Oliver d'Azevedo Christina <oliveiros(dot)cristina(at)gmail(dot)com> wrote:
>
>>>> SELECT DISTINCT trainer_id,trainer_name
>>>> FROM (
>>>> SELECT trainer_name -- The field you want to test for duplicates
>>>> FROM (
>>>> SELECT DISTINCT "trainer_id","trainer_name"
>>>> FROM student
>>>> ) x
>>>> GROUP BY "trainer_name" -- the field you want to test for
>>>> duplicates
>>>> HAVING (COUNT(*) > 1)
>>>> ) z
>>>> NATURAL JOIN student y
>
>
>
> What indices would you recommend for this operation?
>
> --
>
> Best Regards,
> Tarlika Elisabeth Schmitz
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message msi77 2010-09-28 14:00:45 Re: is there a distinct function for comma lists ?
Previous Message gangadhar bandaru 2010-09-28 09:02:23 eB7E2R