Re: surrogate vs natural primary keys

From: Steve Midgley <science(at)misuse(dot)org>
To: pgsql-sql(at)postgresql(dot)org
Cc: Seb <spluque(at)gmail(dot)com>
Subject: Re: surrogate vs natural primary keys
Date: 2008-09-18 04:10:28
Message-ID: 20080918050121.2715D64FCBF@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


>To: pgsql-sql(at)postgresql(dot)org
>From: Seb <spluque(at)gmail(dot)com>
>Subject: Re: surrogate vs natural primary keys
>Date: Mon, 15 Sep 2008 17:56:31 -0500
>Organization: Church of Emacs
>Lines: 20
>Message-ID: <87hc8h2f34(dot)fsf(at)patagonia(dot)sebmags(dot)homelinux(dot)org>
>References: <87y71t2hlu(dot)fsf(at)patagonia(dot)sebmags(dot)homelinux(dot)org>
> <dcc563d10809151545r70d7d2a7j25931d7557dd88f3(at)mail(dot)gmail(dot)com>
>X-Archive-Number: 200809/101
>X-Sequence-Number: 31553
>
>On Mon, 15 Sep 2008 16:45:08 -0600,
>"Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> wrote:
>
>[...]
>
> > I think this question is a lot like "how large should I set
> > shared_buffers?" There's lots of different answers based on how
> you
> > are using your data.
>
>Yes, this is precisely what I'm after: *criteria* to help me decide
>which approach to take for different scenarios. Such guidance is what
>seems to be lacking from most of the discussions I've seen on the
>subject. It's hard to distill this information when most of the
>discussion is centered on advocating one or the other approach.

I think Scott and others have laid out the main ideas in a very
cool-headed way already, but here's my follow-on input:

I agree with Andrew Sullivan that using industry standard id's as your
primary key can be problematic. But I do sometimes apply unique indices
to such "industry standard" columns to ensure they are in fact unique
and can be a surrogate for the "real" integer/serial primary key.

As a rule, I have decided to stay away from "meaningful" (natural)
primary keys for these reasons:

1) They sometimes change b/c of business rule changes, forcing
technical changes to the relationship model, when only internal table
schema changes should be required to support the new business
requirements.

2) Generating arbitrary/surrogate keys is easier b/c you can use
sequence generators. (When creating a new record, I have to figure out
the value of a meaningful column before saving the record which
sometimes I don't want to do!)

3) Surrogate keys are guaranteed unique regardless of semantic content
of the table.

4) All tables can all join to each other in the same ways: property.id
holds the same data type as contact.id. All id fields are the same in
type/format.

I think there's even a reasonable argument for "globally unique"
surrogate keys: all keys for any table use the same sequence of id's. I
implemented a system in the 90's that used globally unique id's and it
opened up some interesting solutions that I wouldn't have thought of
when I started the project (self joins were the same as foreign joins
since the id's in both entities were guaranteed unique).

I've heard some people argue the use of GUID's for id's but I've been
too scared to try that in a real system.

Sequential, arbitrary primary keys (as surrogate keys) are predictable
though. So if you share those keys with the public (via URL's for
example), then competitors can learn information about your business
(how fast keys are generated for a certain table for example).

That's an argument for random, arbitrary primary keys though, not for
compound/meaningful keys.

I think natural or compound keys make more sense to DBA's and let you
implement some kinds of database solutions more quickly.

All in all, I don't really understand the merits of natural keys
outside of data warehouse applications. In data warehouses, in my
experience, compound natural keys just end up turning into fact tables!
:)

In summary: I've never heard someone say they've been bitten by using
an arbitrary surrogate key system, but I myself have been bitten and
have heard lots of stories of problems when using natural keys.

I hope this helps some,

Steve

Browse pgsql-sql by date

  From Date Subject
Next Message Raphael Bauduin 2008-09-18 09:49:26 inserting boolean values in dynamic queries
Previous Message Robert Edwards 2008-09-17 23:28:16 Re: May I have an assistance on CREATE TABLE Command