Skip site navigation (1) Skip section navigation (2)

Re: pervasiveness of surrogate (also called synthetic) keys

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: pervasiveness of surrogate (also called synthetic) keys
Date: 2011-05-02 15:10:38
Message-ID: 4DBEC96E.8070901@2ndQuadrant.com (view raw or flat)
Thread:
Lists: pgsql-general
On 05/01/2011 06:12 PM, Karsten Hilbert wrote:
> Good to know since I'm only a lowly medical doctor not
> having much schooling in database matters beyond this list,
> the PostgreSQL docs, and the Celko book.
>    

This debate exists at all levels of experience, and the only thing that 
changes as you get more experienced people involved is an increase in 
anecdotes on each side.  The sole time I ever found myself arguing with 
Joe Celko is over an article he wrote recommending natural keys, using 
an example from the automotive industry.  Problem was, the specific 
example he gave was flat out wrong.  I was working in automotive MIS at 
the time, and the thing he was saying would never change did, in fact, 
change every year--in only a fraction of a percent of cases, in an 
extremely subtle way that snuck up on people and wreaked much 
confusion.  That's typical for an early natural key design:  you get it 
working fine in V1.0, only to discover months or years down the road 
there's a case you never considered you don't model correctly, and it 
may take some sort of conversion to fix.

The reason why there's a strong preference for surrogate keys is that 
they always work and you can avoid ever needing to come up with a better 
design. if you just use them and forget about it.  The position Merlin 
has advocated here, that there should always be a natural key available 
if you know the data well enough, may be true.  But few people are good 
enough designers to be sure they've made the decision correctly, and the 
downsides of being wrong can be a long, painful conversion process.  
Easier for most people to just eliminate the possibility of making a 
mistake by using auto-generated surrogate keys, where the primary 
problem you'll run into is merely using more space/resources than you 
might otherwise need to have.  It minimizes the worst-case--mistake make 
in the model, expensive re-design--by adding overhead that makes the 
average case more expensive.  Software design usually has enough risks 
that any time you can eliminate one just by throwing some resources at 
it, that's normally the right thing to do.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


In response to

Responses

pgsql-general by date

Next:From: Marek WięckowskiDate: 2011-05-02 15:17:11
Subject: Re: auto-reconnect: temp schemas, sequences, transactions
Previous:From: Raymond O'DonnellDate: 2011-05-02 14:57:59
Subject: Re: wnat ot edit pg_hba.conf file from command prompt

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group