Quick Links

Re: pervasiveness of surrogate (also called synthetic) keys

From:	Misa Simic <misa(dot)simic(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: pervasiveness of surrogate (also called synthetic) keys
Date:	2011-05-04 19:39:02
Message-ID:	BANLkTinmu-qN_hnWKhMtmWD5KDeDWc_XOQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

> Being the “first line” or the “second line” of a physical invoice is a
> property for that line. Identifying its position on the invoice is only
> natural.

> Specifically, the position of the line on the invoice; you can't have to
> invoice lines at the second line of aninvoice for example. This
> information is visible and important to users because presumably the
> invoice lines as entered into an application are in the order you would
> want them printed out on aphysical invoice.

I agree that Line No could say, that is order in which Item is added to
Invoice - but definately it will not mean that it should be printed in that
order... Items could be grouped by Category or whatever on Invoice... in
which order Items will be printed I would say it is more Business Rule...
and as all Business Rules it is changeable by time... for example 1 rule
known to me: SLA item (charge for Service Licence Agreement) is always on
the end, and its price is 20% of other items SubTotal for what SLA should be
paid (so for some Items on Invoice, SLA should be paid, for some not - we
don't know in which order user will enter Items)...

> By your reasoning all identifiers are synthetically generated if you
> consider there is never truly only a single instance of anything in the
> multi-verse. The only truly unique identifier would be the time+place of an
> objects creation.
>
>
>
> “Hello - person born in Liverpool London, St. Whatever hospital, Room 101 @
> 13:14:57AM on the 5th of March 2001 – how may I direct your call?” (I
> guess you could use the conception date as well although twins+ might be
> tough to distinguish in that case).
>

Yes, I agree, I really think that "Natural" keys do not exist... but
definatelly there are Entity properties, like e-mail, InvoiceNo (in whatever
format), what must be Unique (if Business Rule say: that property must be
unique).... That is even hard for humans to think and identify someone on
that way - so humans invented SSN - to make them life easier... but it is
known that SSN is not best canditate for PK...

How humans identify someone/something - is totally different way then how
Relational Database identifies some record...

Human better Identify some person by picture (if it is known to them, if
have it in their memory) - Relational Database, well, today could by 1
digitial picture and just that one... Tomorrow maybe...

If you key on email and the query coming from another table doesn't
> need any other email properties, you just saved yourself a join
> without having to de-normailze in the classic sense. You also get to
> cut out many sorts on similar principles.

Performance and Reliability - that are main things I worry about... and then
made some tests...

JOIN - that is natural thing for Relational Database... Especially JOIN on
Keys - I agree that JOIN on non key columns are expensive...

I can't imagine, that we can get some info from ERP system without join on
every day basis, like just e-mail, Amount as salary in that period...
without First Name, Last Name... but lets do the test

So we have two tables, in two models...

Employees table: 50 000 records

Salaries table: 100 000 records

(For a first test, I didn't have time to wait to fill table with 100 000 000
records... but will run it over night...)

Model one:
Employee
-e-mail (PK)
-FirstName
-LastName

Salary (PK: e-mail, StartDate, EndDate)
-e-mail (FK to employee)
-StartDate
-EndDate
-Amount

(I recommend StartDate/EndDate to replace with Period datatype from many
reasons - but out of this test topic)

Model two:
Employee
-EmpID (UUID -PK)
-e-mail
-FirstName
-LastName

Salary
-SalID (UUID PK)
-EmpID (UUID FK to Employee)
-StartDate
-EndDate
-Amount

Test for Model 1:

SELECT e-mail, startdate, enddate, Amount FROM Salary

Returns 100 000k records in 1000 ms

SELECT Salary.e-mail, FirstName, LastName, startdate, enddate, Amount FROM
Salary INNER JOIN Employees ON Salary.e-mail = Employees.e-mail

Returns 100 000k records in 240 ms

*WOW - query with JOIN returned same result 4 times faster!!!*
*
*
just tried then:

SELECT Salary.e-mail, FirstName, LastName, startdate, enddate, Amount FROM
Salary INNER JOIN Employees ON Salary.e-mail = Employees.e-mail

all info from Salary table but just joined to emploees without columns from
Employees table...

Returns 100 000k records in 240 ms

test for Model2:

SELECT emp_id, startdate, enddate, amount FROM Salary

Returned 100 000k rows in 1314 ms (worst result - though in practice that
query would never been run because of emp_id as GUID means nothing to
human...)

SELECT email, fname, lname, startdate, enddate, amount FROM Salary INNER
JOIN Employees ON Salary.emp_id = Employees.emp_id

Returns 100 000k records in 240 ms (the same as JOIN on e-mail column in
Model1)

> Like many systems of
> the day, it was written in COBOL over an isam data store which didn't
> have the technical capability to do what sequences or uuids do today

I agree. Seq or uuids solved many problems today...and perfectly fits in
Relational Database design as PK...

With Kind Regards,

Misa

In response to

Re: pervasiveness of surrogate (also called synthetic) keys at 2011-05-04 14:32:15 from Merlin Moncure

Browse pgsql-general by date

	From	Date	Subject
Next Message	Merlin Moncure	2011-05-04 20:31:14	Re: ZEOS or PGDAC - How to lock a resource?
Previous Message	David Johnston	2011-05-04 19:28:24	GROUP BY Wildcard Syntax Thought