Re: Weird join result

From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Peter Nixonn <listuser(at)peternixon(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Weird join result
Date: 2006-08-16 17:44:11
Message-ID: 20060816102308.Q7680@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 16 Aug 2006, Peter Nixonn wrote:

> I am getting a result for an JOIN that I think is wrong. Maybe its my
> understanding that is wrong here however, so please be gentle :-)
>
> The "phones" table contains a list of phone numbers and an associated
> customer ID. The radacct table contains a list of all calls made (RADIUS
> Accounting records).
>
> I am doing the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 GROUP BY
> phones.CALLINGSTATIONID;
>
> This query as expected returns 1386 rows (for customer ID 1) which includes
> a number of rows which have a NULL sum as they have not ever connected.
>
> Now, what I want to do is to return the same result set (of 1386 phones),
> but only for a particular time period. I therefore do the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and radacct.ACCTSTARTTIME
> BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' GROUP BY
> phones.CALLINGSTATIONID;
>
> This returns 1280 rows, none of which are have a NULL value for sum. This
> surprised me at first as I thought the WHERE clause should apply before the
> OUTER JOIN but apparently not.

No, in fact it explicitly happens after the join (the order of evaluation
in the theoretical model is basically evaluate the from clause as a table
then apply where on that table to make a new table and so on). You can
either use a subselect in from list or put the condition into the ON to
make its evaluation earlier.

> I then tried the following:
>
> SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
> LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
> radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and
> (radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16
> 15:16:42' OR radacct.ACCTSTARTTIME ISNULL) GROUP BY
> phones.CALLINGSTATIONID;
>
> This query returns 1368 rows, which includes some NULL values for sum,
> however still short of the 1386 rows I am looking for. Close, but no cigar!

Imagine you had
phones (callingstationid = 1, custid = 1)
phones (callingstationid = 2, custid = 1)
phones (callingstationid = 3, custid = 1)
radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00)
radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

Now, I believe the outer join should above give you
(callingstationid = 1, custid = 1, acctstarttime 2006-05-10 00:00:00)
(callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid = 1, acctstarttime NULL)

Then apply the where clause
(callingstationid = 2, custid =1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid =1, acctstarttime NULL)

---

With the subselect in from you'd have
phones (callingstationid = 1, custid = 1)
phones (callingstationid = 2, custid = 1)
phones (callingstationid = 3, custid = 1)
radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00)
radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

The first radacct row doesn't pass, so it's not in the subselect output
which should then look like:
subsel_radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)

Then, I believe the outer join should give you
(callingstationid = 1, custid = 1, acctstarttime NULL)
(callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid = 1, acctstarttime NULL)

And then you apply the where clause again and all the rows go through.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Steve Atkins 2006-08-16 17:51:31 Re: hint unique result fro union
Previous Message Michael Fuhr 2006-08-16 17:38:41 Re: [NOVICE] DB insert Error