Re: NULL values or not?

From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: Archibald Zimonyi <archie(at)netg(dot)se>, Aasmund Midttun Godal <postgresql(at)envisity(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 16:55:11
Message-ID: web-533913@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Archibald,

> So again, I have no problems with NULLS, I just would like someone to
> share when they use them contra when they don't.

Thank you for bringing a theory question onto the list! We spend much
of our time on pgsql-sql focused on practical minutia. Thus several of
the respondants who siezed on your hypothetical example rather than
answering the question. I'd love to see some of the more experienced DB
theorists weigh in on this issue (Tom? Joe C.?).

Relationally speaking, NULLs are a necessary evil. You are right to
approach them with caution. I cannot count the number of times one of
my database rescue attempts has involved eliminating, sub-tabling, or
consolidating NULLable columns.

By coincidence, yesterday I posted this to PGSQL-NOVICE:

============================================================
> 3) Use of NULL
>
> It would facilitate entering data into TABLE clone if seq, qual...
> were defined as NULL even though values for these columns will/does
> exist. If I defined these columns as NULL I would not have to have
> ALL of the data together at one time, but could enter it in steps.
> Is this an ill conceived notion?

Well, this purpose is what NULL is for. Cases where data is unknown or
not yet available. That being said, allowing NULLs takes data integrity
out of the table design and moves it elsewhere in the software. If you
allow NULLs for seq, then you will need to create a data integrity
report that searches for old entries with NULL in the seq column.
Otherwise, you risk having some required data never filled in.

Also, remember that you can't JOIN on a NULL value. For example, if you
allow NULLs in, say clones.gb_id, then if you do a report on clones
JOIN
gb_accessions, the clones who are missing GB will not show up with a
blank GB, instead they will not show up at all! You can get around this
with OUTER JOINS, but OUTER JOINs are not dependable for multi-column
joins.

Instead, I recommend that everywhere it is possible, you have an actual
value that indicates why the data has not been filled in. FOr example,
you could create a gb_accession with the ID of zero (0) (and accn_no,
gi_no of 0 as well) which would indicate (to you) "gb not run yet".
This gives you more information than NULL (which could indicate a
number
of things: GB not run, GB lost, data error, program error, etc.), as
well as supporting JOINs cleanly. You could even have more than one
such value to indicate different reasons for missing info.
=====================================================

For a more in-depth discussion of NULLs and their problems and
workarounds, see Fabian Pascal's "Practical Issues in Database
Management", which has the better part of a chapter on the topic.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Archibald Zimonyi 2001-12-21 17:22:54 Re: NULL values or not?
Previous Message Archibald Zimonyi 2001-12-21 15:48:50 Re: NULL values or not?