Re: Surrogate keys (Was: enums)

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Leandro Guimar??es Faria Corcete DUTRA <leandro(at)dutra(dot)fastmail(dot)fm>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Surrogate keys (Was: enums)
Date: 2006-01-18 19:11:50
Message-ID: 20060118191150.GL17896@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 18, 2006 at 01:08:53PM +0000, Leandro Guimar??es Faria Corcete DUTRA wrote:
> > b) If each parent record will have many children, the space savings from
> > using a surrogate key can be quite large
>
> Not such a common case.

Hmmm...

Many blog entries per user... Many blog comments per entry

Many PO's per customer... many line items per PO...

Etc., etc. I would argue that one-many relationships are far more common
than one-one, and it's very common for an integer ID to be a more
compact representation than a real key.

> > c) depending on how you view things, putting actual keys all over the
> > place is denormalized
>
> How come? Never!

Huh?

One of the tenants of normalization is that you don't repeat data. You
don't use customer name in your PO table, because it's asking for
problems; what if a customer changes names (as just one example).

> > Generally, I just use surrogate keys for everything unless performance
> > dictates something else.
>
> What I am proposing is the reverse: use natural keys for everything unless
> performance dictates something else.
>
> In support of my PoV:
> http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1

Read the bottom of it:

"I am not saying that you should avoid autonumber surrogate keys like an
SCO executive. The danger is not in their use but in their abuse. The
"events_id" column in the "events" table didn't give us any trouble
until we began to rely on it as the sole key for the table. The
accounting application gave us problems because we were using the ID as
the entire handle for the records. That crossed the line from use to
misuse, and we suffered for it."

To paraphrase, the issue isn't that surrogate keys were used for RI; the
issue is that proper keys were not setup to begin with. Does it make
sense to have a customer table where customer_name isn't unique? Almost
certainly not. But that's just one possible constraint you might put on
that table. To put words in Josh's mouth, the issue isn't with using a
surrogate key, it's with not thinking about what constraints you should
be placing on your data.

Take a look at cbk's comment; he does a great job of summing the issue
up.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2006-01-18 19:21:34 log_min_messages and debug levels
Previous Message Bruce Momjian 2006-01-18 17:45:29 Re: debug_query_string and multiple statements