Re: When is a record NULL?

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: When is a record NULL?
Date: 2009-07-26 00:40:31
Message-ID: 20090726004030.GO5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 25, 2009 at 11:42:04AM -0500, Kevin Grittner wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> > In this dichotomy a NULL is most definitely a value and with my
> > current experience I don't understand the distinction you're trying
> > to draw.
>
> There can be a place where a value *could* go which does not contain a
> value. Codd considered it crucial, from a mathematical correctness
> point of view, that the absence of a value not be indicated by some
> special "magic value", but rather by some other technique which
> indicates that there *is* no value there. In SQL this is done with
> NULL. Based on reading his books, it seems to me that Codd always
> seemed uncomfortable with this, since it made it appear to be some
> special value, which he was adamant that it is *not*.

Have you used more academic languages like ML or Haskell? Their
"option" and "Maybe" types, respectively, provide the nicest practical
treatment I've seen of this.

> It seems he
> would have preferred a relational language use a term like "FLAGGED AS
> MISSING" rather than "IS NULL". It also would have allowed the
> flexibility to differentiate various types of missing values, such as
> "FLAGGED AS UNKNOWN" or "FLAGGED AS NOT APPLICABLE".

Those sorts of lists get domain specific *very* quickly. I think
languages are better off providing a type system of enough complexity to
express maybe types and letting users invent whatever is most useful for
the job at hand.

> I do think
> that set logic in relational data involves some slightly different
> twists on things than most language have.

You must live in a very different world from me then! :)

> I tend, for bettor or
> worse, to come down in agreement with the positions Codd espoused on
> most of these things.

I've not read much of his writings, any canonical references for this
sort of discussion?

> > [PG] ... internally knows there is a distinction
> > between the two but it doesn't like to expose this.
>
> Well, to some extent I think it's a tough problem, since the set logic
> of a relational database is implemented in C, which doesn't have the
> same concepts. There's got to be a little slight of hand in there
> somewhere.

That's a pretty bad excuse; everything ends up as machine code
in the end. Many languages expose very abstract and consistent
views of things, some of them a lot more awkward than that of a
relational database. PG is admittedly hampered by a desire to follow a
particularly "innovative" standard and correctness for PG commonly means
keeping data safe. Being logically consistent normally takes a minor
role, although there is quite a lot of overlap between the two.

> > If your model is correct then when the IS DISTINCT FROM operator
> > works on RECORDs the following should return FALSE for all of the
> > following:
> >
> > SELECT NULL IS DISTINCT FROM ROW(NULL);
> > SELECT NULL IS DISTINCT FROM ROW(NULL,NULL);
> > SELECT NULL IS DISTINCT FROM ROW(NULL,ROW(NULL,NULL));
> > SELECT ROW(NULL) IS DISTINCT FROM ROW(NULL,ROW(NULL,NULL));
> >
> > i.e. there is *no* difference between a NULL record and a record
> > consisting entirely of NULLs.
>
> Well, on that I would go with whatever the SQL standard says, and hope
> it's not too ambiguous. (I haven't tried to sort though this one in
> the standard, so far.)

I've had a reasonable browse around an old copy of SQL'08 I found, but
couldn't find much helpful. The definition of distinct (3.1.6.8 in my
copy) seems particularly vacuous, it would seem to leave it down to how
rows are constructed and I can't find many details of that.

> I was going into the theory both because it is
> the basis for some of the seemingly odd aspects of SQL, and because at
> least half the time I see someone put the word NULL immediately in
> front of the word VALUE, they are wandering into confusion on these
> issues. (I will admit that using such technically incorrect language
> is sometimes hard to avoid without sounding stilted, even if all
> parties to the conversation know that NULL is *not* a value.)

I think that depends on what definition of "value" you're using. If
you're considering it to be a value like a NULL pointer then I'd agree
as this is just a convention to treat "pointers with a value of zero"
specially. If you treat values as members of a set with the set defined
as their type and a sub-type relation existing between types then a NULL
value is the only member of an unnamed type (in SQL) that's the subtype
of all other types. There are other ways of formalizing this, and I've
probably explained it badly here, but it's a rough sketch of how I think
about it.

> I know that Codd was insistent that any relation (which included the
> result of any query) which could contain duplicate rows should be
> called a "corrupted relation". (In fact, in one of his books I think
> he averaged a comment on this point about once every two pages.) So I
> shudder to think what his reaction would be to a relation with a row
> which contained no values. I have a really hard time figuring out
> what useful information such a row could represent.

There's a *big* difference between what's allowed in a table and what
can come back from an arbitrary calculation. Think about what "b" would
be in the following:

SELECT b
FROM a LEFT JOIN b ON FALSE;

SQL does say "A table is a multiset of rows. A row is a nonempty
sequence of values." PG still needs some way of representing the above,
and the example David gave while I was writing this.

--
Sam http://samason.me.uk/

[1] http://hol.sourceforge.net/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-07-26 00:41:12 Re: [PATCH] DefaultACLs
Previous Message Joshua Tolley 2009-07-26 00:39:23 Re: [PATCH] DefaultACLs