Re: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results

From: Marc SanF <marcsf23(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results
Date: 2008-02-13 23:59:57
Message-ID: 735645.75571.qm@web53609.mail.re2.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I respectfully challenge that the aggregation is correct.

In the where clause, I specify A2.AUDIT_TYPE_CODE = CONTENT_2, thus returning only 2 rows for A2 and not all of the rows in A2 which happen to have a TXN_COUNT of 1 / row but could in fact be any positive number. I used 1 for simplicity. Similarly, if you take out A1 from the query you receive the following result:

audit_date | content_policy_name | sum_2
2008-01-01 | TEST POLICY | 2

I do not see how/why a self-join changes the condition specified in the where clause and thus returns a sum of 8 rows that do not meet the specified condition in the query?

Thanks in advance,
Marcus Torres

----- Original Message ----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Marcus Torres <marcsf23(at)yahoo(dot)com>; pgsql-bugs(at)postgresql(dot)org
Sent: Wednesday, February 13, 2008 7:40:32 AM
Subject: Re: [BUGS] BUG #3958: Self-Join Group-By Clause Produces Incorrect Results

"Heikki
Linnakangas"
<heikki(at)enterprisedb(dot)com>
writes:
>
Marcus
Torres
wrote:
>>
I
wrote
a
simple
self-join
query
to
sum
the
transaction
count
of
different
>>
types
of
records
in
a
audit
table
and
the
result
set
for
the
different
sum
>>
totals
was
the
same
which
is
incorrect.

>
Looks
perfectly
correct
to
me.

Me
too.
The
underlying
data
before
grouping/aggregation
is

regression=#
select



A1.AUDIT_DATE,
P.CONTENT_POLICY_NAME,
A1.TXN_COUNT,
A2.TXN_COUNT
FROM
T_AUDIT
A1,



T_AUDIT
A2,



T_POLICY
P
WHERE
P.ID
=
A1.POLICY_ID

AND
P.ID
=
A2.POLICY_ID

AND
A1.POLICY_ID
=
A2.POLICY_ID

AND
A1.AUDIT_DATE
=
A2.AUDIT_DATE

AND
A1.AUDIT_TYPE_CODE
=
'CONTENT_1'

AND
A2.AUDIT_TYPE_CODE
=
'CONTENT_2';

audit_date
|
content_policy_name
|
txn_count
|
txn_count
------------+---------------------+-----------+-----------

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1
(10
rows)

from
which
it's
clear
that
given
all
ones
in
txn_count,
the
sums
*must*
be
the
same
because
they're
taken
over
the
same
number
of
rows.

I
suspect
what
the
OP
needs
is
two
separate
queries
(perhaps
union'ed
together)
not
a
self-join.




regards,
tom
lane

____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs

Browse pgsql-bugs by date

  From Date Subject
Next Message Marc SanF 2008-02-14 00:05:50 Re: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results
Previous Message Tom Lane 2008-02-13 20:06:10 Re: psql malloc problem