Re: [SQL] Oddities with NULL and GROUP BY

From: Herouth Maoz <herouth(at)oumail(dot)openu(dot)ac(dot)il>
To: pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [SQL] Oddities with NULL and GROUP BY
Date: 1999-05-19 13:52:54
Message-ID: l03130303b3686def8239@[147.233.159.109]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

At 18:28 +0300 on 17/05/1999, José Soares wrote:

> The Pratical SQL Handbook at page 171 says:
> Since nulls represent "the great unknown", there is no way to know
> whether one null is equal to any other null. Each unknown value
> may or may not be different from another.
> However, if the grouping column contains more than one null,
> all of them are put into a single group.
>
> Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.

This is something I have complained about time and again. It is time
something is changed about it, otherwise Postgres will NEVER be a
standard-compliant RDBMS.

The SQL92 text says:

A null value is an implementation-dependent special value that
is distinct from all non-null values of the associated data type.
There is effectively only one null value and that value is a member
of every SQL data type. There is no <literal> for a null value,
although the keyword NULL is used in some places to indicate that a
null value is desired.

Thus, by rights, NULL=NULL should be true, because there is only one null
value.

About the <group by clause>, the text says:

1) The result of the <group by clause> is a partitioning of T into
a set of groups. The set is the minimum number of groups such
that, for each grouping column of each group of more than one
row, no two values of that grouping column are distinct.

And the treatment of nulls is implied from the definition of distinctness:

h) distinct: Two values are said to be not distinct if either:
both are the null value, or they compare equal according to
Subclause 8.2, "<comparison predicate>". Otherwise they are
distinct. Two rows (or partial rows) are distinct if at least
one of their pairs of respective values is distinct. Otherwise
they are not distinct. The result of evaluating whether or not
two values or two rows are distinct is never unknown.

About uniqueness, it says:

A unique constraint is satisfied if and only if no two rows in
a table have the same non-null values in the unique columns. In
addition, if the unique constraint was defined with PRIMARY KEY,
then it requires that none of the values in the specified column or
columns be the null value.

One should note, however, that when the actual comparison operator "=" is
used, the standard says that if one of the operands is null, the result of
the comparison is unknown. One should make a distinction between making
comparisons within group by, uniqueness, and other database-logic
operations, and between making the actual comparison (though in my opinion,
this should not be so. Comparing a null value to something should be always
false unless the other something is also null. But that's my opinion and
not the standard's).

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 1999-05-19 13:54:02 Re: [HACKERS] PyGreSQL 2.4
Previous Message Constantin Teodorescu 1999-05-19 13:49:57 Broken select on regular expression !!!

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 1999-05-19 13:53:20 Re: [SQL] Why wont this update work?
Previous Message secret 1999-05-19 13:46:52 Re: [SQL] Oddities with NULL and GROUP BY