Re: surrogate key or not?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Iain" <iain(at)mst(dot)co(dot)jp>, "Markus Bertheau" <twanger(at)bluetwanger(dot)de>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: surrogate key or not?
Date: 2004-07-21 07:00:59
Message-ID: 200407210001.00102.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Markus, Iain,

> Thus I see it more as an issue of business logic than performance. There are
> of course many other considerations with relational theory and stuff like
> that which you could debate endlessly. I expect that googling on "surrogate
> keys" would yeild interesting results.

Frankly, I couldn't disagree more. This is why it was so problematic for the
SQL committee to enshrine "primary keys" and sequences in the standard; it
mis-educates database designers into believing that surrogate keys are
somehow part of the data model. They are most decidely NOT.

Given: Surrogate keys, by definition, represent no real data;
Given: Only items which represent real data have any place in
a data model
Conclusion: Surrogate keys have no place in the data model

There are, in fact, three very good reasons to use surrogate keys, all of
which are strictly due to limitations of technology; that is, implementation
and performance issues, NOT business logic. They are:

1) Convenience: It's very annoying to have to refer to a 4-column foriegn key
whenever you do a join in queries or want to delete a record, as well as
tracking a 4-element composite in your client software.

2) Performance: INT and BIGINT data types are among the most compact and
efficient stored in most RDBMSs. So using anything else as a key would
likely result in a loss of performance on large-table joins.

3) Mutability: Most RDBMSs are very inefficient about CASCADE deletes and
updates. Some RDBMSs do not support CASCADE, forcing the client software to
fix all the dependant rows. This means that DBAs are very reluctant to use
columns which change frequently as join keys.

All three of these implementation issues are, at least in theory,
surmountable. For example, Sybase overcame problems (1) and (3) by creating
an automated, system-controlled hash key based on the table's real key. This
was a solution endorsed by E.F. Codd in the mid-90's when he came to regret
his promotion of the "Primary Key" idea in the SQL standard.

Now, you're probably wondering "why does this guy regard surrogate keys as a
problem?" I'll tell you: I absolutely cannot count the number of "bad
databases" I've encountered which contained tables with a surrogate key, and
NO REAL KEY of any kind. This makes data normalization impossible, and
cleanup of the database becomes a labor-intensive process requiring
hand-examination of each row.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Oliver Elphick 2004-07-21 07:11:27 Re: Inherited tables and new fields
Previous Message sad 2004-07-21 05:01:13 Re: locks and triggers. give me an advice please