Re: JOIN producing duplicate results

From: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
To: "Brent Dombrowski" <brent(dot)dombrowski(at)gmail(dot)com>, "Lonni J Friedman" <netllama(at)gmail(dot)com>
Cc: <pgsql-novice(at)postgresql(dot)org>
Subject: Re: JOIN producing duplicate results
Date: 2012-05-03 13:53:31
Message-ID: 10DA95B1BEDD4D47AD0F6E3BE02EB245@marktestcr.marktest.pt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Do you have any table inheriting from smoketests?

Best,
Oliveiros

----- Original Message -----
From: "Brent Dombrowski" <brent(dot)dombrowski(at)gmail(dot)com>
To: "Lonni J Friedman" <netllama(at)gmail(dot)com>
Cc: <pgsql-novice(at)postgresql(dot)org>
Sent: Thursday, May 03, 2012 2:22 PM
Subject: Re: [NOVICE] JOIN producing duplicate results

You are not using a.id in the join. My guess is that all the other columns
in a that you are joining against are not unique and that is where the
duplicates are coming from.

Brent.

On May 2, 2012, at 4:51 PM, Lonni J Friedman wrote:

> I've got a query that is joining data across 4 tables to provide data
> based on test results. The query is working fine, except for the fact
> that its returning two identical records for each row of unique data.
> If I throw a DISTINCT in front of the primary key column (a.id) of
> one of the tables in the join, that eliminates all the duplicates.
> However, I've read (and found) that DISTINCT tends to introduce a
> performance hit, so I'm hoping to find a better performing solution,
> if possible. Hopefully I'm just doing something silly with my JOINS
> that is easily fixed. This is on postgresql-9.0.x, and yes I'm aware
> that if i upgraded to 9.1.x then I could likely do a 'group by a.id',
> but for now I'm stuck on 9.0.x.
>
> Here's the query:
> SELECT a.id,a.suiteid,a.testname,date_trunc('second',a.last_update) AS
> last_update,regexp_replace(p.relname,E'tests','','g'),o.osname
> FROM smoketests AS a, pg_class AS p, smoke AS t, osversmap AS o
> WHERE a.osversion=o.osversion AND a.suiteid=t.id AND a.tableoid=p.oid
> AND ( a.current_status='FAILED' ) AND ( a.arch='i386' ) AND (
> a.os='Darwin' ) AND a.last_update>'2012-05-01 04:00:00' AND
> a.last_update<'2012-05-02 14:20:45'
> ORDER BY a.id ;
>
> id | suiteid | testname | last_update |
> regexp_replace | osname
> ----------+---------+------------------+---------------------+----------------+------------
> 32549818 | 668232 | bug377064 | 2012-05-01 08:38:07 | smoke
> | OSX-10.7.x
> 32549818 | 668232 | bug377064 | 2012-05-01 08:38:07 | smoke
> | OSX-10.7.x
> 32549819 | 668232 | funcmem_resize | 2012-05-01 08:38:07 | smoke
> | OSX-10.7.x
> 32549819 | 668232 | funcmem_resize | 2012-05-01 08:38:07 | smoke
> | OSX-10.7.x
> 32549820 | 668232 | leitest | 2012-05-01 08:38:07 | smoke |
> OSX-10.7.x
> 32549820 | 668232 | leitest | 2012-05-01 08:38:07 | smoke |
> OSX-10.7.x
>
> The problem is visible in the id column, where there are two of each
> value returned even though a.id is the unique primary key of the smoke
> table and doesn't really have duplicates. The 'smoke' table has a one
> to many relationship with the smoketests table, but I'm still rather
> confused why I'm getting the duplicates of everything.
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Merlin Moncure 2012-05-03 14:29:00 Re: how to tell if a tuple has been inserted in current transaction
Previous Message pg noob 2012-05-03 13:50:28 how to tell if a tuple has been inserted in current transaction