Re: [HACKERS] Severe SUBSELECT bug in 6.5 CVS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Bitmead <chris(dot)bitmead(at)bigfoot(dot)com>
Cc: pgsql-hackers(at)hub(dot)org
Subject: Re: [HACKERS] Severe SUBSELECT bug in 6.5 CVS
Date: 1999-06-26 17:22:50
Message-ID: 26256.930417770@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Chris Bitmead <chris(dot)bitmead(at)bigfoot(dot)com> writes:
> httpd=> select * from a where i not in (select i from b);
> [ returns nothing if b contains any nulls in column i ]

Of course, what's happening here is that the NOT IN is (in effect)
transformed to
a.i != b.i1 AND a.i != b.i2 AND a.i != b.i3 ...
(writing i1, i2, ... for the values extracted from b). Then, since
any comparison involving NULL returns FALSE, the where-clause fails
for all values of a.i.

I think this actually is a bug, not because it's wrong for "x != NULL"
to be false, but because the SQL spec defines "a NOT IN t" as equivalent
to "NOT (a IN t)". IN is implemented as
a.i = b.i1 OR a.i = b.i2 OR a.i = b.i3 ...
which will effectively ignore nulls in b --- it'll return true if and
only if a.i matches one of the non-null values in b. Our implementation
fails to maintain the equivalence that NOT IN is the negation of this.

It appears to me that to follow the SQL spec, a NULL found in a.i
should return NULL for both IN and NOT IN (the spec appears to say that
the result of IN is "unknown" in that case, and we are using NULL to
represent "unknown"):
c) If the implied <comparison predicate> is true for at least
one row RT in T, then "R <comp op> <some> T" is true.
d) If T is empty or if the implied <comparison predicate> is
false for every row RT in T, then "R <comp op> <some> T" is
false.
e) If "R <comp op> <quantifier> T" is neither true nor false,
then it is unknown.
(recall that null compared to anything yields unknown, not false).
I don't believe we currently have that behavior, but it seems
reasonable.

More subtly, it looks like for a non-null a.i, IN should return TRUE
if there is a match in b, even if b also contains nulls (fine), but if
there is no match in b and b contains nulls then the spec seems to
require NULL, *not* FALSE, to be returned! I'm not sure I like that
conclusion...

In the meantime, a workaround for Chris is to use NOT (i IN ...) instead
of NOT IN. That should work as he expects, at least for nulls in b.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-06-26 17:27:59 Re: [HACKERS] regression bigtest needs very long time
Previous Message Tatsuo Ishii 1999-06-26 15:55:55 Re: utf-8