Re: Basic Q on superfluous primary keys

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
Subject: Re: Basic Q on superfluous primary keys
Date: 2007-04-16 18:24:15
Message-ID: 4623BF4F.5070308@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Craig A. James wrote:
> Merlin Moncure wrote:
>> Using surrogate keys is dangerous and can lead to very bad design
>> habits that are unfortunately so prevalent in the software industry
>> they are virtually taught in schools. ... While there is
>> nothing wrong with them in principle (you are exchanging one key for
>> another as a performance optimization), they make it all too easy to
>> create denormalized designs and tables with no real identifying
>> criteria, etc,...
>
> Wow, that's the opposite of everything I've ever been taught, and all my
> experience in the last few decades.
>
> ...chemistry...two companies using the same "natural"
> keys had as much as 10% differences in their multi-million-compound
> databases. These errors led to six-month to year-long delays, as each
> of the conflicting chemical record had to be examined by hand by a PhD
> chemist to reclassify it.

That sounds almost like a feature, not a bug - giving information
about what assumptions that went into the "natural key" need to be
reconsidered.

And I don't see how it would have been improved by adding a surrogate
key - except that the data would have been just as messed up though
harder to see where the messups were.

> We've always recommended to our customers that all primary keys be
> completely information free. They should be not based on any
> information or combination of information from the data records. Every
> time the customer has not followed this advice, they've later regretted it.

Hmm... but then do you put a unique index on what the
otherwise-would-have-been-natural-primary-key columns?

If not, you tend to get into the odd situation of multiple
rows that only vary in their surrogate key -- and it seems
the surrogate key is redundant.

> I'm sure there are situations where a natural key is appropriate, but I
> haven't seen it in my work.

I've seen both - and indeed usually use surrogate keys for convenience;
but also find that having to fix incorrect assumptions in natural primary
keys tends to raise business issues that are worth addressing anyway.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Steve 2007-04-16 22:13:54 Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
Previous Message Merlin Moncure 2007-04-16 15:55:29 Re: Basic Q on superfluous primary keys