Re: BUG #6669: unique index w/ multiple columns and NULLs

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org, jo <jose(dot)soares(at)sferacarta(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #6669: unique index w/ multiple columns and NULLs
Date: 2012-06-04 16:36:37
Message-ID: 20120604163637.GF2352@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jun 04, 2012 at 11:26:20AM -0500, Kevin Grittner wrote:
> > I think the original complaint is that NULL != NULL in a WHERE
> > clause, but GROUP BY is able to group them together just fine.
>
> Whoa! I think I know what you meant, but that is a dangerously
> misleading misstatement. It is not true that NULL = NULL, but
> neither is it true that NULL != NULL. In fact, you also don't get
> TRUE from NOT NULL = NULL. If you don't know either value, you
> can't know that they are equal, and you can't know that they are
> unequal. The results of such comparisons are UNKNOWN. That doesn't
> mean you would always find the count of rows where the value is NULL
> uninteresting; hence the IS NOT DISTINCT FROM concept is effectively
> used for grouping.

Well, thinking of this from a procedural language perspective, a new
value comes in and GROUP BY has to figure out if it already has a
matching value. If a NULL comes in, anything you compare it to is NULL,
even another NULL, so what is the logic that allows these nulls to be
placed in the same group? Every value to compare it to returns NULL.
I assume IS DISTINCT FROM is the answer.

> Performing logical or set operations on data sets with missing
> values is an inherently tricky business, but I think that overall
> SQL has made reasonable choices on how to do that; my biggest gripe
> is that there is no standard way to distinguish between UNKNOWN and
> NOT APPLICABLE. The fuzziest areas seem to me to be related to
> that deficiency. As long as NULL is not abused for such things as
> "known to be zero" in an accounting record (which is subtly but
> significantly different from "not applicable"), NULL is almost
> always (IMO) better than some "magic value". If you have ever
> converted data from a database where names were split into multiple
> fields, and NULL was not allowed for middle name, you will probably
> agree.

Agreed, we could use another null-type value, though NULLs are already
so confusing that I am sure adding another one just caused too many
groans in the room when it was suggested.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2012-06-04 20:33:05 Re: Calling xlst_process with certain arguments causes server crash
Previous Message Kevin Grittner 2012-06-04 16:26:20 Re: BUG #6669: unique index w/ multiple columns and NULLs