Re: RFC: A brief guide to nulls

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: dev(at)archonet(dot)com, pgsql-sql(at)postgresql(dot)org
Subject: Re: RFC: A brief guide to nulls
Date: 2003-01-14 19:46:50
Message-ID: 200301141146.50308.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Richard,

Thanks for writing this! Would it be cool if we included it in the Linux
Weekly News? I'll ask Robert.

Two edits:

On Wednesday 15 January 2003 09:23, dev(at)archonet(dot)com wrote:
> There have been a few posts recently where people have had problems with
> nulls. Anyone got comments on the below before I submit it to techdocs?
>
> TIA
>
> - Richard Huxton
>
> A Brief Guide to NULLs
> ======================
>
> What is a null?
> ===============
> A null is *not* an empty string.
> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.
>
>
> What do nulls mean?
> ===================
> Well, they *should* mean one of two things:
> 1. There is no applicable value
> 2. There is a value but it is unknown
>
> Example 1: Imagine you have a customer table with name and sex fields.
> If you get a new customer "ACME Widgets Ltd", the sex field is meaningless
> since your customer is a company (case 1).
> If you get a new customer "Jackie Smith" they might be male or female, but
> you might not know (case 2).
>
> Example 2: You have an address table with (street,city,county,postalcode)
> fields.
> You might insert an address ("10 Downing Street","London",Null,"WC1 1AA")
> since you don't have a valid county.
> You might also insert an address ("1 Any Street","Maidstone","Kent",Null)
> where there *must be* a valid postalcode, but you don't know what it is.
>
> It might be useful to be able to distinguish between these two cases - not
> applicable and unknown, but there is only one option "Null" available to
> us, so we can't.
>
>
> How do nulls work?
> ==================
> There is one very important rule when dealing with nulls. The result of
> any operation or comparison, when applied to a null is null. The only
> exception is testing if a value is null.
>
> Example: with the customer table above you could run the following queries:
> SELECT * FROM customer WHERE sex='M';
> SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value of sex is 'M'
> and all those with values not equal to 'M' but not rows with *no value at
> all*
>
> It might help to think of a database as a set of statements you *know* to
> be true. A null indicates that you *cannot say anything at all* about that
> field. You can't say what it is, you can't say what it isn't, you can only
> say there is some information missing.
>
> So, to see all the customers with unknown or inapplicable sex you would
need:
> SELECT * FROM customer WHERE sex IS NULL;

** MOVE the above paragraph ....

>
> There are actually three possible results for a test in SQL - True (the
> test passed), False (the test failed) and Null (you tested against a
> null). A result of null usually gets treated as False, so testing against
> nulls always fails.
>
> If you try to perform an operation on nulls, again the result is always
> null. So the results of all of the following are null:
> SELECT 'abc' || null;
> SELECT 1 + null;
> SELECT sqrt(null::numeric);
> The first case can be especially confusing. Concatenating a null string to
> a string value will return null, not the original value.

** ... down to here, and define the IS NULL and IS NOT NULL operators for the
reader.

>
> Uniqueness and nulls
> ====================
> If you define a unique index on a column it prevents you inserting two
> values that are the same. It does not prevent you inserting as many nulls
> as you like. How could it, you don't have a value so it can't be the same
> as any other.
>
> Example: We create a table "ta" with a unique constraint on column "b"
> CREATE TABLE ta (
> a int4,
> b varchar(3),
> PRIMARY KEY (a)
> );
> CREATE UNIQUE INDEX ta_b_idx ON ta (b);
> INSERT INTO ta VALUES (1,'aaa'); -- succeeds
> INSERT INTO ta VALUES (2,'bbb'); -- succeeds
> INSERT INTO ta VALUES (3,null); -- succeeds
> INSERT INTO ta VALUES (4,'bbb'); -- fails
> INSERT INTO ta VALUES (5,null); -- succeeds!
>
> Given the definition of what a null is, you only have two choices: allow
> multiple nulls or allow no nulls. If you want no nulls, define the column
> as NOT NULL when creating the table.
>
>
> Keys and nulls
> ==============
> No column that is part of a primary key can be null. When you define a
> PRIMARY KEY, none of the columns mentioned can take a null value.
> Postgresql makes sure of this by defining the columns as NOT NULL for you.
>
> Example: With table "ta" we just created, \d ta will show column a as
> being not null. Otherwise, we could insert rows with a set to null and
> have no way to tell them apart.
>
>
> Subqueries and nulls
> ====================
> Since tests always fail when testing against nulls you can have unexpected
> results with sub-queries.
>
> Example: Assume we have a companies table and a diary table. Diary entries
> are usually related to a particular company but not always.
> SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM
> diary);
> If any row in diary contains a null dy_company then you will get *no
> results*.
> We can expand the query like so:
> WHERE co_id NOT IN (SELECT dy_company FROM diary)
> WHERE co_id NOT IN (1, 2, null, 3...)
> WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
> WHERE NOT (... OR null OR ...)
> WHERE NOT (null)
> WHERE null

I'd suggest that you do not want to give readers an example that contains NOT
IN (subselect). There are almost no cricumstances under which a user would
choose NOT IN (subselect) over WHERE NOT EXISTS (subselect), and giving it in
an example will just confuse newbies.

> You either need to explicitly check for null values, or define the column
> in question as NOT NULL.
>
>
> Further Information
> ===================
> Any good relational database book (try something written by Date)
> Bruce's book (link LHS at http://techdocs.postgresql.org)
> My Postgresql Notes (link at http://techdocs.postgresql.org)

Add links to the two PostgreSQL book pages ... the one on Techdocs and the one
on www.postgresql.org/users-lounge

FWIW.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message jasiek 2003-01-14 20:02:33 Re: full join in view
Previous Message Josh Berkus 2003-01-14 19:38:43 Re: query speed joining tables