JOIN producing duplicate results

From: Lonni J Friedman <netllama(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: JOIN producing duplicate results
Date: 2012-05-02 23:51:54
Message-ID: CAP=oouHhqwj6j2tDz_YQ5RaJVsRopPtm6Dy9cFxYy8RxR_OGFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

I've got a query that is joining data across 4 tables to provide data
based on test results. The query is working fine, except for the fact
that its returning two identical records for each row of unique data.
If I throw a DISTINCT in front of the primary key column (a.id) of
one of the tables in the join, that eliminates all the duplicates.
However, I've read (and found) that DISTINCT tends to introduce a
performance hit, so I'm hoping to find a better performing solution,
if possible. Hopefully I'm just doing something silly with my JOINS
that is easily fixed. This is on postgresql-9.0.x, and yes I'm aware
that if i upgraded to 9.1.x then I could likely do a 'group by a.id',
but for now I'm stuck on 9.0.x.

Here's the query:
SELECT a.id,a.suiteid,a.testname,date_trunc('second',a.last_update) AS
last_update,regexp_replace(p.relname,E'tests','','g'),o.osname
FROM smoketests AS a, pg_class AS p, smoke AS t, osversmap AS o
WHERE a.osversion=o.osversion AND a.suiteid=t.id AND a.tableoid=p.oid
AND ( a.current_status='FAILED' ) AND ( a.arch='i386' ) AND (
a.os='Darwin' ) AND a.last_update>'2012-05-01 04:00:00' AND
a.last_update<'2012-05-02 14:20:45'
ORDER BY a.id ;

id | suiteid | testname | last_update |
regexp_replace | osname
----------+---------+------------------+---------------------+----------------+------------
32549818 | 668232 | bug377064 | 2012-05-01 08:38:07 | smoke
| OSX-10.7.x
32549818 | 668232 | bug377064 | 2012-05-01 08:38:07 | smoke
| OSX-10.7.x
32549819 | 668232 | funcmem_resize | 2012-05-01 08:38:07 | smoke
| OSX-10.7.x
32549819 | 668232 | funcmem_resize | 2012-05-01 08:38:07 | smoke
| OSX-10.7.x
32549820 | 668232 | leitest | 2012-05-01 08:38:07 | smoke | OSX-10.7.x
32549820 | 668232 | leitest | 2012-05-01 08:38:07 | smoke | OSX-10.7.x

The problem is visible in the id column, where there are two of each
value returned even though a.id is the unique primary key of the smoke
table and doesn't really have duplicates. The 'smoke' table has a one
to many relationship with the smoketests table, but I'm still rather
confused why I'm getting the duplicates of everything.

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Binand Sethumadhavan 2012-05-03 04:09:42 Re: JOIN producing duplicate results
Previous Message Josh Kupershmidt 2012-05-02 22:40:53 Re: PostgresSQL 8.4 - Data output - Newline