Re: PRIMARY KEY on a *group* of columns imply that each column is

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Stephane Bortzmeyer <bortzmeyer(at)nic(dot)fr>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PRIMARY KEY on a *group* of columns imply that each column is
Date: 2005-04-28 12:04:29
Message-ID: 1114689869.12081.129.camel@Frodo.esi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2005-04-27 at 17:00 +0200, Stephane Bortzmeyer wrote:
> On Wed, Apr 27, 2005 at 09:36:57AM -0500,
> Scott Marlowe <smarlowe(at)g2switchworks(dot)com> wrote
> a message of 18 lines which said:
>
> > Often the best bet here, btw, is to declare it not null then use
> > something other than null to represent null, like the text
> > characters NA or something.
>
> Yes, but it defeats the purpose of NULL. And what should I use as a
> "pseudo-NULL" value for INET? 127.0.0.1? 0.0.0.0? Special values are
> well-known for the problems they raise. That's why many languages have
> NULL-like solutions (None in Python, undef in Perl, Maybe types in
> Haskell, etc).

No. NULL is NOT 'None', nor 'undef', and definitely not NULL as in C.
Those are perfectly defined values, although special ones. Only 'undef'
is quite misleading, but nevertheless it is just _one_ value.
That is, given a variable A, you can always write a boolean expression
that evaluates True or False to test if A is _equal_ to None/undef/NULL
(in C):

$ python -c "a = None; print a == None"
True

$ perl -e 'print a == undef, "\n"'
1

$ cat p.c
#include <stdio.h>

int
main(int argc, char *argv[])
{
char *a = NULL;
printf ("%d\n", a == NULL);
}

$ cc p.c
$ ./a.out
1

About Haskell, I don't know. For what I understand from quick reading
the manual, it'd say that Nothing is similar. You _can_ tell if
something is equal to Nothing.

In databases, NULL has a completely different meaning. It doesn't mean
_no value_, which is just a special (single) value, but it means
_unknown value_. You simply can't compare it with any single value
(even special ones) and expect a boolean answer. The only possible
answer is 'I don't know', which is NULL in boolean. Notice that the
boolean NULL is _not_ the same of False. So you get:

marco=# select 2 = 2;
?column?
----------
t
(1 row)

marco=# select 2 = 3;
?column?
----------
f
(1 row)

marco=# select 2 = NULL;
?column?
----------

(1 row)

that is, neither true nor false. Back to your example, you can compare
('a', 2) with ('a', 2), the result is 't' and thus you've managed
to identify the right row (it works as a primary key).
Also, ('a', 3') is different from ('a', '2'), so you can tell the two
rows are different. But what if you allow ('a', NULL)?

('a', NULL) is neither the same _nor different_ from ('a', 2). The
result of comparison is NULL, no matter how you're testing it:

marco=# select ('a', 2) = ('a', NULL);
?column?
----------

(1 row)

marco=# select ('a', 2) <> ('a', NULL);
?column?
----------

(1 row)

see? _Neither_ one is true. This would completely defeat the purpose of
the primary key. And of course, comparing ('a', NULL) with ('a', NULL)
results in exactly the same:

marco=# select ('a', NULL) = ('a', NULL);
?column?
----------

(1 row)

marco=# select ('a', NULL) <> ('a', NULL);
?column?
----------

(1 row)

That's why NULLs are not allowed in primary keys. The key simply won't
work. NULL in databases is not _one_ special value. It's _any_ value,
since it's unknown. The boolean expression:

2 = NULL

might be true or might be false, since NULL could be _any_ integer in
this expression.

This is completely different from the semantic of None/undef/NULL in
most programming languages.

You wrote:

"Special values are well-known for the problems they raise."

then NULL is definitely _not_ the value you're looking for.
Everything can be said of NULL, but that it is "well-known".

In your case, by choosing (name, address) as the primary key, you're
saying 'I need to know both the name and the address to be able to
retrieve a datum in the table'. This implies that if you have partial
knowledge (you don't know the address), you can't "naturally" retrieve a
single datum (or insert it).

Depending on what you're trying to achieve, you may need to split
the table (normalization the theorists call it). I don't like theory
much, but its conclusions sometimes just make a lot of sense. :-)
Review your design, maybe either the table schema or the choice of the
primary key is not natural for your database.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2005-04-28 13:14:11 Re: Reduce size of $PGDATA for "demo cdrom"?
Previous Message Sebastian Böck 2005-04-28 09:33:43 Problem with GIST-index and timestamps