Re: Surrogate keys (Was: enums)

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Surrogate keys (Was: enums)
Date: 2006-01-18 14:53:54
Message-ID: 87slrlwprx.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Leandro Guimarães Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm> writes:

> Greg Stark <gsstark <at> mit.edu> writes:
>
> > I hate knee-jerk reactions too, but just think of all the pain of people
> > dealing with databases where they used Social Security numbers for primary
> > keys. I would never use an attribute that represents some real-world datum as
> > a primary key any more.
>
> I am not familiar with the situation.

The US gov't handed out unique numbers to every worker for their old age
pension program. Many early database designers thought that made a wonderful
natural primary key.

It turns out that:

a) not everyone has a social insurance number: when their business expanded to
include foreign nationals these databases had to make up fake social insurance
numbers.

b) Occasionally people's social insurance numbers change, either because they
got it wrong in the first place or because of identity theft later on. Even
dealing with it changing isn't good enough because the old records don't
disappear; the person essentially has *two* social insurance numbers.

c) For security reasons it turns out to be a bad idea to be passing around
social insurance numbers in the first place. So these database designers had a
major problem adapting when people started refusing to give them social
insurance numbers or complaining when their application leaked their social
insurance number.

In short, what seemed like the clearest possible example of a natural primary
key became a great example of how hard it is to deal with changing business
requirements when you've tied your database design to the old rules. Using
natural primary keys makes an iron-clad design assumption that the business
rules surrounding that datum will never change. And the one thing constant in
business is that business rules change.

In the past I've used "username" as a primary key for a users table, what
could be safer?

Later we had to create a sequence generated userid column because some data
partners couldn't handle an text column without corrupting it. And of course
one day the question arose whether we could handle someone wanting to change
their username. Then another day we were asked whether we could have two
different people with the same username if they belonged to separate branded
subsites.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-01-18 15:37:59 Re: Bad estimate on LIKE matching
Previous Message Jonah H. Harris 2006-01-18 14:15:04 Re: Unique constraints for non-btree indexes