Re: Surrogate keys (Was: enums)

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Surrogate keys (Was: enums)
Date: 2006-01-19 21:29:20
Message-ID: 20060119212920.GA17981@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 19, 2006 at 10:09:26AM -0800, Josh Berkus wrote:
> Martjin,
>
> >In any of these either misspellings, changes of names, ownership or
> >even structure over time render the obvious useless as keys. There are
> >techniques for detecting and reducing duplication but the point is that
> >for any of these duplicates *can* be valid data.
>
> Please point me out where, in the writings of E.F. Codd or in the SQL
> Standard, it says that keys have to be immutable for the life of the row.

Possibly nowhere. But when you send invoices to customers, any details
on there *are* immutable. Sure, in your database you don't care if
things change, but then they don't match reality anymore do they?

> Duplicate *values* can be valid data. Duplicate *tuples* show some
> serious flaws in your database design. If you have a personnel
> directory on which you've not bothered to define any unique constraints
> other than the ID column, then you can't match your data to reality. If
> you have two rows with the same first and last name, you don't know if
> they are two different people or the same person, duplicated. Which
> will be a big problem come paycheck time.

I never said there were duplicate tuples, just that the data has no
natural keys. The tuples are unique because there's a surrogate key. It
is entirely possible to have two people with the same first name, last
name and date of birth. Rather uncommon, but the database must be able
to support it.

I don't understand your example though. If you have a personnel
directory with two rows with the same first and last name, what does
that tell you. Nothing. You have to go find out whether there really
are two of those people or not. You can simplify the process by taking
into account the fact that it's very unlikely, but a unique constraint
is not the answer. Besides, it's far more likely the same person will
appear twice with two different spellings of their name. :)

Anyway, the discussion was about surrogate vs natural keys. Nothing
here has convinced me that there are any useful natural keys to be
found in the examples I gave. Most of the examples I gave come from a
system I had to maintain where some designer had assumed there was some
kind of natural key and in *each* and *every* case it caused
problems...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2006-01-19 21:46:29 Re: suppress output for benchmarking
Previous Message Larry Rosenman 2006-01-19 21:26:38 BuildFarm: Do we need another FreeBSD/amd64 member?