Re: pervasiveness of surrogate (also called synthetic) keys

From: Dann Corbit <DCorbit(at)connx(dot)com>
To: 'Rob Sargent' <robjsargent(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Greg Smith <greg(at)2ndQuadrant(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: pervasiveness of surrogate (also called synthetic) keys
Date: 2011-05-03 02:49:46
Message-ID: 87F42982BF2B434F831FCEF4C45FC33E4220E778@EXCHANGE.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> owner(at)postgresql(dot)org] On Behalf Of Rob Sargent
> Sent: Monday, May 02, 2011 7:07 PM
> To: Jeff Davis
> Cc: Greg Smith; pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] pervasiveness of surrogate (also called
> synthetic) keys
>
>
>
> Jeff Davis wrote:
> > On Mon, 2011-05-02 at 11:10 -0400, Greg Smith wrote:
> >> The position Merlin
> >> has advocated here, that there should always be a natural key
> available
> >> if you know the data well enough, may be true. But few people are
> good
> >> enough designers to be sure they've made the decision correctly, and
> the
> >> downsides of being wrong can be a long, painful conversion process.
> >> Easier for most people to just eliminate the possibility of making a
> >> mistake by using auto-generated surrogate keys, where the primary
> >> problem you'll run into is merely using more space/resources than
> you
> >> might otherwise need to have. It minimizes the worst-case--mistake
> make
> >> in the model, expensive re-design--by adding overhead that makes the
> >> average case more expensive.
> >
> > Once you really try to define "natural" and "surrogate" keys, I think
> a
> > lot of the arguments disappear. I wrote about this a few years back:
> >
> > http://thoughts.j-davis.com/2007/12/11/terminology-confusion/
> >
> > In particular, I think you are falsely assuming that a natural key
> must
> > be generated from an outside source (or some source outside of your
> > control), and is therefore not reliably unique.
> >
> > You can generate your own keys, and if you hand them out to customers
> > and include them on paperwork, they are now a part of the reality
> that
> > your database models -- and therefore become natural keys. Invoice
> > numbers, driver's license numbers, etc., are all natural keys,
> because
> > they are known about, and used, in reality. Usernames are, too, the
> only
> > difference is that you let the user choose it.
> >
> > In contrast, a pointer or a UUID typically does not represent
> reality,
> > because no humans ever see it and no computer systems outside yours
> know
> > about it. So, it's merely an implementation detail and should not be
> a
> > part of the model.
> >
> > Regards,
> > Jeff Davis
> >
> My wife works (at the sql level) with shall we say "records about
> people". Real records, real people. Somewhere around 2 million unique
> individuals, several million source records. They don't all have ssn,
> they don't all have a drivers license. They don't all have an address,
> many have several addresses (especially over time) and separate people
> have at one time or another lived at the same address. You would be
> surprise how many "bob smith"s where born on the same day. But then
> they weren't all born in a hospital etc etc etc. A person may present
> on any of a birth record, a death record, a hospital record, a drivers
> license, a medical registry, a marriage record and so on. There simply
> is no natural key for a human. We won't even worry about the
> non-uniqueness of ssn. And please don't get her started on twins. :)
>
>
> I can only imagine that other equally complex entities are just as
> slippery when it comes time to pinpoint the natural key.

People are sometimes surprised to discover the social security numbers are not unique.

There are fraudulent social security numbers:
http://www2.nbc4i.com/news/2010/dec/06/2/study-finds-millions-duplicate-social-security-num-ar-316988/

There are numbers given out by the IRS by mistake:
http://wnyt.com/article/stories/S1594530.shtml?cat=10115

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2011-05-03 02:52:23 Re: pervasiveness of surrogate (also called synthetic) keys
Previous Message Rob Sargent 2011-05-03 02:06:38 Re: pervasiveness of surrogate (also called synthetic) keys