Re: Surrogate keys (Was: enums)

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Dann Corbit <DCorbit(at)connx(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Surrogate keys (Was: enums)
Date: 2006-01-19 19:59:11
Message-ID: 20060119195911.GX78403@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 19, 2006 at 11:22:24AM -0800, Dann Corbit wrote:
> > > In any of these either misspellings, changes of names, ownership or
> > > even structure over time render the obvious useless as keys. There
> are
> > > techniques for detecting and reducing duplication but the point is
> that
> > > for any of these duplicates *can* be valid data.
> >
> > Please point me out where, in the writings of E.F. Codd or in the SQL
> > Standard, it says that keys have to be immutable for the life of the
> row.
>
> Only do that for data that you care about. If you think that the data
> has no value, there is no need to have a way to identify a row.

Erm... if you don't care, why are you storing it? :)

> I was a DBA for a database for a company with many millions of customers
> worldwide (e.g. the product registration table was 24 GB).
>
> Their design had natural keys in it. It caused dozens of problems,
> every single day.
>
> I content that most people are not smart enough to decide when a natural
> key is a good idea. The engineers that designed the database were
> probably pretty smart, since it sort of worked and had thousands of
> tables and hundreds of millions of rows in it. But one bad decision on
> a natural key will cause literally millions of dollars of damage.
>
> The primary defense I have heard so far is that the Oids are hard to
> understand. They are nothing in comparison with understanding what to
> do when you have 25 changes to primary keys on various tables every
> single day.
>
> Once you get used to Oids, I find it hard to believe that any
> intelligent person finds them confusing. Confusion resulting from
> having primary keys that are a moving target? Now that's confusion for
> you.

Well, I wouldn't use OIDs as in the PostgreSQL OID, but I agree. If
nothing else an ID gives you a fallback... if you absolutely can't find
a customer (or whatever else) through natural keys, you ask them for
their customer ID/number, which has no reason to ever change.

BTW, if you want to see a mess*, take a look at the distributed.net stats
code, which unfortunately uses email as the means to identify
participants. It made perfect sense originally, anyone running the
client was bound to have an email address, and they all had to be
unique, right? Worked great until the first person contacted us
wondering how to change his email address in stats because he'd changed
ISPs. If you look at todays statscode (at least the database portion of
it) approximately 50% of it is there to deal with people retiring one
email address into another, and I'd say that 90%+ of the bugs are in
this code. Had we just required new users to register to get a nice
shiny unique numeric ID (or a unique username...), none of that code
would exist.

* note that I'm not trying to rag on any of the numerous people who've
been involved in the stats code over the years, but it is insightful to
look at some of the 'dumb mistakes' that have been made and the large
amount of pain that it's caused.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-01-19 20:03:17 Re: No heap lookups on index
Previous Message Jim C. Nasby 2006-01-19 19:48:18 Re: Surrogate keys (Was: enums)