Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: "Pilling, Michael" <Michael(dot)Pilling(at)dsto(dot)defence(dot)gov(dot)au>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]
Date: 2011-06-20 07:16:47
Message-ID: 4DFEF3DF.5080003@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 20/06/11 14:19, Jeff Davis wrote:
> On Mon, 2011-06-20 at 10:00 +0930, Pilling, Michael wrote:
>> The behaviour of the generated code may well be correct and indeed I
>> agree that it is but from
>> everything you and the detailed documentation have said column != NULL
>> is at least deprecated
>> and is highly likely to indicate a programming error.
>
> The right side of the expression may be an expression as well; e.g.:
> a != b (or a <> b)
>
> The DBMS would not know that one side is NULL until runtime.

If he's talking specifically about the case of a NULL literal, a
parser-level warning could be emitted because the parser *does* know it
was a literal NULL.

I'm not convinced it's a good idea to warn about this case myself, but
for a NULL literal it's at least vaguely practical.

> It may be possible to make a static analysis "safety check" tool to warn
> users about dangerous constructs like that, but it would be a fairly
> major effort (and would probably just end up telling you to put COALESCE
> everywhere).

To be even remotely useful, it'd have to be able to prove that certain
variables cannot be NULL in certain places. For example in this trivial
made-up case:

SELECT a FROM tablename
WHERE a IS NOT NULL AND b IS NOT NULL
GROUP BY a, b HAVING a > b;

... looks like a dangerous test ("a > b" without excluding/handling
NULL) but in fact the WHERE clause already excluded potentially
problematic tuples so it can never match a NULL result. Warning on that
test would be incorrect, and adding a NULL check / CASE / COALESCE to it
would make the query marginally slower and significantly less readable.

In some places it is not possible to handle NULL inputs explicitly
without multiply evaluating an expensive function or subquery. That not
only has performance implications but may be a real issue if the
expression uses volatile functions or functions with side effects.

Even if that wasn't complicated enough, to be useful in the real world
the tool would probably have to be able to work on SQL embedded in
source code, including C and Java, probably including code that
assembles SQL from fragments.

All in all, it strikes me as a vast amount of work that's only
questionably even possible for little to no gain.

> The bottom line is that NULLs are a little on the dangerous side. If you
> think your example is bad, consider the semantics of NOT IN with respect
> to NULL -- that's a trap even for experts. If you want to be safe, make
> liberal use of COALESCE and WHERE x IS NOT NULL on any expression that
> you think might ever evaluate to NULL.

Agreed. I don't think anyone is a total fan of NULL and SQL's
three-valued logic, or would argue that it's properly consistent and ...
logical. Unfortunately, we're kind of stuck with it because of the SQL
standards process.

Other solutions to the no-value problem are pretty darn ugly in
different ways, anyway. In particular, throwing an error when any
operation is performed on an undef/unknown value is one possibility that
gets old REALLY fast.

--
Craig Ringer

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Denis de Bernardy 2011-06-20 07:23:45 Re: PG regression with row comparison when btree_gist is enabled (BUG)
Previous Message Jeff Davis 2011-06-20 06:19:33 Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]