Re: simple LEFT JOIN giving wrong results ...

From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: "Marc G(dot) Fournier" <scrappy(at)hub(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: simple LEFT JOIN giving wrong results ...
Date: 2003-12-12 00:47:23
Message-ID: 20031211163626.N45954@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Thu, 11 Dec 2003, Marc G. Fournier wrote:

>
> I've got to be missing something obvious here ... I have two tables, on
> containing traffic stats, the other disk usage ... I want to do a JOIN in
> one query to give me the sum of traffic and average of storage ...
> seperately, the results are right .. put together, traffic values are way
> off, while storage is still correct ...
>
> So, I'm doing my LEFT JOIN wrong *somehow*, but its eluding me as to
> what/how .. :(
>
> ams=# select ct.ip_id, sum(ct.bytes) as traffic,
> avg(cs.bytes)::bigint as storage
> from company_00186.traffic ct
> left join company_00186.storage cs ON ( ct.ip_id = cs.ip_id AND
> month_trunc(cs.runtime) = '2003-12-01')
> where month_trunc(ct.runtime) = '2003-12-01' group by ct.ip_id;

If there are multiple cs rows that match a particular ct row with those
constraints you're going to be getting a multiple of the ct.bytes value
with sum because you're going to have an extra ct.bytes for
each matching cs row.

If you had two traffic rows with ip_id 1 with bytes 10 and 5 and two
storage rows with ip_id 1 with bytes 5 and 3 (and assuming that they both
are in the right month), the join should give you a set like:
((ip_id=1, ct.bytes=10, cs.bytes=5), (ip_id=1, ct.bytes=10, cs.bytes=3),
(ip_id=1, ct.bytes=5, cs.bytes=5), (ip_id=1, ct.bytes=5, cs.bytes=3))
I don't think a join between the two tables is what you really want.

Maybe a join between the two group by querys, something like:
select ip_id, traffic, storage
from (select ip_id, sum(ct.bytes) as traffic
from company_00186.traffic ct where month_trunc(runtime)='2003-12-01')
a left outer join
(select ip_id, avg(ct.bytes) as storage
from company_00186.storage ct where month_trunc(runtime)='2003-12-01')
b using (ip_id);

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Marc G. Fournier 2003-12-12 20:53:55 Skip dups on INSERT instead of generating an error ...
Previous Message Marc G. Fournier 2003-12-11 23:54:42 Re: simple LEFT JOIN giving wrong results ...