Skip site navigation (1) Skip section navigation (2)

Re: pervasiveness of surrogate (also called synthetic) keys

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Greg Smith <greg(at)2ndQuadrant(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: pervasiveness of surrogate (also called synthetic) keys
Date: 2011-05-03 02:06:38
Message-ID: 4DBF632E.8090401@gmail.com (view raw or flat)
Thread:
Lists: pgsql-general

Jeff Davis wrote:
> On Mon, 2011-05-02 at 11:10 -0400, Greg Smith wrote:
>> The position Merlin 
>> has advocated here, that there should always be a natural key available 
>> if you know the data well enough, may be true.  But few people are good 
>> enough designers to be sure they've made the decision correctly, and the 
>> downsides of being wrong can be a long, painful conversion process.  
>> Easier for most people to just eliminate the possibility of making a 
>> mistake by using auto-generated surrogate keys, where the primary 
>> problem you'll run into is merely using more space/resources than you 
>> might otherwise need to have.  It minimizes the worst-case--mistake make 
>> in the model, expensive re-design--by adding overhead that makes the 
>> average case more expensive.
>
> Once you really try to define "natural" and "surrogate" keys, I think a
> lot of the arguments disappear. I wrote about this a few years back:
>
> http://thoughts.j-davis.com/2007/12/11/terminology-confusion/
>
> In particular, I think you are falsely assuming that a natural key must
> be generated from an outside source (or some source outside of your
> control), and is therefore not reliably unique.
>
> You can generate your own keys, and if you hand them out to customers
> and include them on paperwork, they are now a part of the reality that
> your database models -- and therefore become natural keys. Invoice
> numbers, driver's license numbers, etc., are all natural keys, because
> they are known about, and used, in reality. Usernames are, too, the only
> difference is that you let the user choose it.
>
> In contrast, a pointer or a UUID typically does not represent reality,
> because no humans ever see it and no computer systems outside yours know
> about it. So, it's merely an implementation detail and should not be a
> part of the model.
>
> Regards,
> 	Jeff Davis
>
My wife works (at the sql level) with shall we say "records about 
people".  Real records, real people.  Somewhere around 2 million unique 
individuals, several million source records.  They don't all have ssn, 
they don't all have a drivers license.  They don't all have an address, 
many have several addresses (especially over time) and separate people 
have at one time or another lived at the same address.  You would be 
surprise how many "bob smith"s where born on the same day.  But then 
they weren't all born in a hospital etc etc etc.  A person may present 
on any of a birth record, a death record, a hospital record, a drivers 
license, a medical registry, a marriage record and so on.  There simply 
is no natural key for a human.  We won't even worry about the 
non-uniqueness of ssn. And please don't get her started on twins. :) 


I can only imagine that other equally complex entities are just as 
slippery when it comes time to pinpoint the natural key.

rjs

In response to

Responses

pgsql-general by date

Next:From: Dann CorbitDate: 2011-05-03 02:49:46
Subject: Re: pervasiveness of surrogate (also called synthetic) keys
Previous:From: Josh BerkusDate: 2011-05-03 01:56:06
Subject: We need you to beta-test PostgreSQL 9.1

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group