Re: BUG #1528: Rows returned that should be excluded by WHERE clause

From: Peter Wright <pete(at)flooble(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1528: Rows returned that should be excluded by WHERE clause
Date: 2005-03-11 05:45:18
Message-ID: 20050311054518.GA5375@cartman.flooble.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers


Hi Tom, others,

First I must say that I appreciate the effort you've invested already
into finding the best "correct" solution. It's very encouraging. :)

I think I understand your analysis of the problem being that HAVING is
erroneously optimised/simplified to WHERE in some cases - and so the
initial "bug" I reported is technically the correct behaviour(?).

....Okay, maybe I'm not completely sure I've understood you correctly. :)

On 08/03 03:07:13, Tom Lane wrote:
> "Peter Wright" <pete(at)flooble(dot)net> writes:
> > Description: Rows returned that should be excluded by WHERE clause
>
> Interesting point. The view and union don't seem to be the issue;

I think the view _is_ the issue (well, at least for me and my
(limited) understanding of how things should work :)). See below.

> I think the problem can be expressed as
[ snip ]
> Now, if this were a WHERE clause, I think the answer would be right:
>
> regression=# select 2 as id, max(b) from t2 where 2 = 1;
> id | max
> ----+-----
> 2 |
> (1 row)
>
> but since it's HAVING I think this is probably wrong.
[ ... ]

On 08/03 12:14:35, Tom Lane wrote:
> "Gill, Jerry T." <JTGill(at)west(dot)com> writes:
> > Just an interesting side note here, this behavior is identical to
> > DB2. I am not sure if that makes it correct or not, but here is an
> > example.
> > [gill(at)c2n2 gill]$ db2 "select 2 as id, max(apn3) from phoenix.client where 2 =1"
>
> > ID 2
> > ----------- ------
> > 2 -
>
> > 1 record(s) selected.
>
> In the WHERE case I think there's no question that the above is
> correct: WHERE is defined to filter rows before application of
> aggregates, so zero rows arrive at the MAX aggregate, and that means
> it produces a NULL.

Now this _does_ make sense - but in the case of a view (or
sub-select), how can it be correct that a WHERE _outside_ the view can
affect the behaviour of that view?

At the very least I'd call that grossly anti-intuitive:

----------------------------------------------------------------------
test1=# create table tab1 ( a integer, b integer );
CREATE TABLE
test1=# insert into tab1 values ( 1, 1 );
INSERT 118421921 1
test1=# insert into tab1 values ( 2, 2 );
INSERT 118421922 1
test1=# create view qry1 as select 2 as id, max(b) as b from tab1;
CREATE VIEW
test1=# create table tab2 ( id integer, b integer );
CREATE TABLE
test1=# insert into tab2 values ( 2, 2);
INSERT 118421931 1
test1=# select * from tab2;
id | b
----+---
2 | 2
(1 row)

test1=# select * from qry1;
id | b
----+---
2 | 2
(1 row)

test1=# select * from qry1 where id = 1;
id | b
----+---
2 |
(1 row)

test1=# select * from tab2 where id = 1;
id | b
----+---
(0 rows)

test1=#
----------------------------------------------------------------------

You say, "WHERE is defined to filter rows before application of
aggregates", but I'd _think_ that should be interpreted to apply only
to aggregates in the _current_ query (ie. not in sub-queries).

In my example just above, I'd _expect_ the view should be fully
evaluated and the results (of that view) treated as though it were
just another table.

Perhaps I'm just showing my limited experience with database theory
here :-), but if you can explain why it makes sense that WHERE must be
applied before aggregation in _all_ subqueries, that'd be good *wry grin*.

> But HAVING is supposed to filter after aggregation, so I think
> probably there should be no row out in that case.

I have no problem with this.

> regards, tom lane

Thanks again for your efforts with this issue, Tom.

Currently I'm working around it by adding an extra kludge-clause,
effectively "AND b is not null", but it'd be preferable to have
Postgres do the Right Thing(tm), whatever that might be.... :)

Pete.
--
http://akira.apana.org.au/~pete/
And anyway, we know that 2 + 2 = 5, for very large values of two...

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2005-03-11 06:48:47 Re: BUG #1528: Rows returned that should be excluded by WHERE clause
Previous Message John R Pierce 2005-03-11 00:32:53 Re: We are not following the spec for HAVING without GROUP

Browse pgsql-hackers by date

  From Date Subject
Next Message Nicolai Tufar 2005-03-11 05:58:11 Re: [pgsql-hackers-win32] snprintf causes regression tests to fail
Previous Message Bruce Momjian 2005-03-11 04:10:12 Re: [pgsql-hackers-win32] snprintf causes regression