Re: BUG #14235: inconsistencies with IS NULL / IS NOT NULL

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14235: inconsistencies with IS NULL / IS NOT NULL
Date: 2016-07-22 22:27:32
Message-ID: 87zip9qti4.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> The executor, when doing IS [NOT] NULL on a composite value, looks
>> at each column to see if it is the null value. It does NOT recurse
>> into nested composite values, and my reading of the spec suggests
>> that this is correct.

Tom> Hmm. Of course the $64 question is whether that really is
Tom> correct, or sensible.

Tom> I went to look at the spec, and discovered that SQL:2011 actually
Tom> has wording that is different from SQL99, which I think is what we
Tom> relied on last time we considered this issue.

I've been mainly using the 2008 spec, but it seems to have identical
wording to the 2011 one.

My reading is as follows, based on these parts of the spec:

[Framework] 4.4.2 The null value
Every data type includes a special value, called the null value,
sometimes denoted by the keyword NULL. This value differs from other
values in the following respects: [...]

[Foundation] 8.8 <null predicate> (as you quoted)

It seems to me that where the spec refers to "the null value", this is
exactly what we do with the "isnull" flag, and that <null predicate> is
a separate thing.

Where the spec refers to the "fields" of a row, it at least strongly
implies that these are never taken recursively:

[Framework] 4.4.5.2 Row types
A row type is a sequence of one or more (field name, data type)
pairs, known as fields. A value of a row type consists of one value
for each of its fields.

Also in 6.2 <field definition> the "degree" of a row type is defined in
such a way that a row type with one declared field, whose type is also a
row type, has a degree of 1. So the comment in 8.8 about degree-1 rows
only makes sense if 8.8 isn't supposed to recurse.

Plus, if 8.8 had been intended to be recursive, it could easily have
been written that way.

Most of the references to nullity in the spec talk specifically about
"the null value" rather than referencing IS [NOT] NULL. In particular,
constructs like count(x) refer to "eliminating the null value", which in
turn implies that count(x) is not the same thing as count(*) filter
(where x is not null) if x is a row type. Also, IS DISTINCT FROM only
refers to "the null value", so (x IS DISTINCT FROM NULL) is not the same
as either (x IS NOT NULL) or (NOT (x IS NULL)). Likewise the rules for
<routine invocation> only talk about "the null value" in deciding
whether a routine declared RETURNS NULL ON NULL INPUT (i.e. STRICT) is
actually called.

The places that explicitly use IS [NOT] NULL are:

- some definitions about whether values are known-not-null, which is
ok because (x IS NOT NULL) clearly implies that x is not the null
value

- some rules for multiset operations where the operands must be
multisets, not rows

- NOT NULL constraints on columns are transformed into CHECK (col IS
NOT NULL), which isn't what we do: our not-null constraint currently
only excludes the null value, though this has been discussed in some
past threads

- COALESCE is implemented as CASE WHEN x IS NOT NULL THEN ... which
again is not what we do. The logic of the spec's version of this is
truly bizarre, and I can't imagine any possible use for it; having
COALESCE(row(1,null),row(2,null)) return (2,null) as the spec
demands seems to be all kinds of wrong.

Tom> <null predicate> ::= <row value predicand> <null predicate part 2>
Tom> <null predicate part 2> ::= IS [ NOT ] NULL

Tom> (Oddly, SQL does not seem to allow IS [NOT] NULL on non-composite
Tom> values, which is just silly.)

<row value predicand> can be a <row value constructor predicand> which
can be a <common value expression> or a <boolean predicand>, either of
which can be a <value expression primary>, and in both of these cases
the syntax rules specify that they are to be replaced by ROW(X). So
non-composite values are allowed and are indistinguishable from
composite values of degree 1.

(Note that <row value predicand> can also be a <row value special case>
which can be a <nonparenthesized value expression primary>, but that is
required to be of a row type. The spec here seems to rely on constraints
about the declared types to disambiguate between the case of a <row
value special case> and a <common value expression> which is a
<nonparenthesized value expression primary>.)

Tom> Rule (2a) was not there in SQL99. But look at what this is doing:
Tom> it is admitting straight out that a null composite value is not
Tom> the same as a composite value all of whose fields are null. It is
Tom> only asserting that a <null predicate> will not distinguish them.

Right, that's my reading too.

Tom> The implication is that it's just fine if, say, COALESCE() doesn't
Tom> act that way.

Not COALESCE, because COALESCE is defined as a syntactic transform to a
CASE expression that uses IS NOT NULL.

But I really don't think we should follow the spec on that one.

>> It seems possible that this could be fixed by simply setting
>> argisrow=false for all the null tests generated in such cases.

Tom> I concur that this is an appropriate fix if we believe that
Tom> ExecEvalNullTest's behavior is correct.

I believe it is, as explained above.

That said, there may also be merit in arguing that the spec is just
broken on this point. I think back when IS NULL was originally changed
to conform to the spec, it might have been better to just say "no,
that's silly" and stick with the old behavior.

We wouldn't strictly speaking be violating the spec if we did that,
because we don't claim to support feature T051 "Row types", and so we
can just say that our row types are a nonstandard extension without the
standard IS NULL behavior. We do in fact violate the spec here in many
ways; for example we treat types from CREATE TYPE foo AS (fields) as
being row types, but in fact the spec says these are user-defined
structured types and not row types.

But whether it would be a good idea to actually revert to the old
way... that's another question. Does anyone actually _want_ the spec
behavior of IS [NOT] NULL?

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-07-22 23:40:15 Re: BUG #14235: inconsistencies with IS NULL / IS NOT NULL
Previous Message Tom Lane 2016-07-22 20:21:44 Re: BUG #14235: inconsistencies with IS NULL / IS NOT NULL