Re: pervasiveness of surrogate (also called synthetic) keys

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: pervasiveness of surrogate (also called synthetic) keys
Date: 2011-05-03 02:52:23
Message-ID: 4DBF6DE7.7060602@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 03/05/11 10:06, Rob Sargent wrote:

> My wife works (at the sql level) with shall we say "records about
> people". Real records, real people. Somewhere around 2 million unique
> individuals, several million source records. They don't all have ssn,
> they don't all have a drivers license. They don't all have an address,
> many have several addresses (especially over time) and separate people
> have at one time or another lived at the same address.

... and that's before we get into the horror of "what is someone's
name". Which name? Which spelling? Do they even have a single canonical
name? Is their canonical name - if any - expressable in the character
set used by the service? Is it even covered by Unicode?!? Does it make
any sense to split their name up into the traditional
english-speaking-recent-western "family" and "given" name parts? Is
there a single consistent way to do so for their name even if it does? etc.

SSN? What if they don't live in the US or aren't a citizen? What if they
have more than one SSN (yes, it happens)? Or there's one being shared by
more than one person (again, this happens) and they can't get it fixed
or don't want to?

My mother's postal address - before street numbering was introduced a
few years ago on her road - used to be "Rural Delivery Area 2,
SomeTownName, New Zealand". You'd be amazed how many systems could not
cope with that; she used to have to register all sorts of things to her
parents' address in the nearby town.

People decide to key a database on US Social Security number - because
it's a "unique natural key" then discover the exciting problems with
that. While they're battling those, business needs change and the
database starts needing to accept people from other countries, who don't
have a US SSN and don't know what one is or what it looks like.

Tracking people/companies in databases are ideal candidates for
synthetic keying with a solid split/merge procedure to handle
duplicates, overlapping identity records, etc.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2011-05-03 03:07:55 Re: pervasiveness of surrogate (also called synthetic) keys
Previous Message Dann Corbit 2011-05-03 02:49:46 Re: pervasiveness of surrogate (also called synthetic) keys