Re: There can be only one! How to avoid the "highlander-problem".

From: Erwin Brandstetter <brsaweda(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: There can be only one! How to avoid the "highlander-problem".
Date: 2007-06-05 01:54:42
Message-ID: 1181008482.859185.111800@q69g2000hsb.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Lew!

Thank you for your comments. I have elaborated on them.

On Jun 3, 7:22 pm, Lew <l(dot)(dot)(dot)(at)nospam(dot)lewscanon(dot)com> wrote:
(...)
> The trouble with this is that it models "kingship" as an attribute of every
> man. (What, no female rulers allowed?)

Yeah, saddening, isn't it? Actually, for simplicity's sake I
restricted my model to a "male, monarchistic world".

> The overhead of being "not king" is
> carried in every "mankind" record. This may suffice for your particular model,
> but if you were designing for evolution you'd have a problem. Every new
> attribute of "mankind" would need a new column in the table - "isDuke",
> "isNoble", "isHogSlopCleaner".

You are right, of course. (I switch to "nation" instead of "people" in
my examples like you did, as the term seems clearer.)

However, in your SQL model, you extracted nationality instead of
kingship. If every man has to be member of exactly one nation (which I
postulate), nationality can reside with the man. (we need
man.nation_id instead of nation.man_id) That leaves only the kingship
to be allocated. I postulate further that a king only be king of his
own people (rules out multiple kingships, too). So the "king" needs
only to have 1 attribute: man_id.
To make room for other roles, as you mentioned, I include a role_id.
However, roles must be as unique like the kingship. To enforce
uniqueness of one king (or other role) per nation I include the
seemingly redundant nation_id and impose a UNIQUE (nation_id, role_id)
on it.
To enforce that a man can only become king of his own people, I wrap
both (man_id, nation_id) in a FOREIGN KEY constraint on "man".
PostgreSQL therefore requires a corresponding (redundant) UNIQUE
(nation_id, role_id) on "man".
!NOTE that I do NOT reference table "nation", so we have no circular
foreign-key constraints!

0.) Lets number the models:

CREATE TABLE nation
(
nation_id INTEGER PRIMARY KEY
);

CREATE TABLE man
(
man_id INTEGER PRIMARY KEY,
nation_id INTEGER NOT NULL REFERENCES nation (nation_id) ON UPDATE
CASCADE ON DELETE CASCADE
);

CREATE TABLE role -- "role" is non-reserved word in postgresql or
SQL2003, but reserved in SQL99
(
man_id INTEGER PRIMARY KEY REFERENCES man (man_id) ON UPDATE
CASCADE ON DELETE CASCADE,
nation_id INTEGER,
role_id INTEGER,
UNIQUE (nation_id, role_id)
FOREIGN KEY (man_id, nation_id) REFERENCES man (man_id, nation_id)
ON UPDATE CASCADE ON DELETE CASCADE
);

This makes sense if we have a lot of men per nation and an unknown
number of unique roles per nation. I will simplify this model step by
step now, along with simplified conditions:

1.) First, lets get rid of multiple roles. My model only needs
kingship. So I replace table "role" with the following table
"king" (the rest is the same). :

CREATE TABLE king
(
king_id INTEGER PRIMARY KEY REFERENCES man (man_id) ON UPDATE
CASCADE ON DELETE CASCADE,
nation_id INTEGER UNIQUE,
FOREIGN KEY (man_id, nation_id) REFERENCES man (man_id, nation_id)
ON UPDATE CASCADE ON DELETE CASCADE
);

2.) Now we can further simplify the structure. Skip the table "king"
and merge kingship as an attribute into table "man". This makes sense
with one (or a small number of ) known role(s).
Adds a field to _every_ man and gets rid of one tuple per king and the
overhead for that extra table.
Whether this is preferable over 1.) depends on the typical number of
men per nation. If there is more than just a few, you should stick to
1.). If there is only a few, however, you gain something.
Note, how we reference nation(nation_id) twice (!), but only one time
is NOT NULL.
We are still avoiding circular references.

CREATE TABLE man
(
man_id INTEGER PRIMARY KEY,
nation_id INTEGER NOT NULL REFERENCES nation (nation_id) ON UPDATE
CASCADE ON DELETE CASCADE,
king_id INTEGER UNIQUE REFERENCES nation (nation_id) ON UPDATE
CASCADE ON DELETE CASCADE,
CHECK ((nation_id = king_id)) -- needed to make sure a man
can only become king of his own people.
);

3.) As an improvement over 2.) we can merge kingship into nation (as
you suggested).
Note the "ON DELETE SET NULL" clause, that allows a king to die.
Actually I would pass on kingship to another man (or NULL if none are
left) per trigger, much like in my initial post:
"trg_mankind_delaft()".
Note also that king_id isn't "NOT NULL", so we need to be prepared for
nations without a king (king_id IS NULL).
To enforce a king we'd set it "NOT NULL DEFAULT 0", but then we'd need
a dummy man with man_id = 0 to serve referential integrity and that's
where the circular references begin to bite. Because the dummy man
needs a nation first. This could only be solved by entering a dummy
nation and a dummy man before enforcing referential integrity.
We also need triggers BEFORE INSERT AND UPDATE to check that the king
is member of the nation
IF NEW.king_id IS NOT NULL AND nation_id IS DISTINCT FROM
NEW.nation_id FROM man WHERE man_id = NEW.king_id THEN
RAISE EXCEPTION 'Usurper!';
END IF;
Now we have to store only one field per nation and not per man.

CREATE TABLE nation
(
nation_id INTEGER PRIMARY KEY
king_id INTEGER REFERENCES man (man_id) ON UPDATE CASCADE ON DELETE
SET NULL
);

CREATE TABLE man
(
man_id INTEGER PRIMARY KEY,
nation_id INTEGER NOT NULL REFERENCES nation (nation_id) ON UPDATE
CASCADE ON DELETE CASCADE
);

4.) Finally, we can simplify the model 2.) in another way and avoid
the circular references. As man.king_id only ever references the same
nation.nation_id as man.nation_id, we can substitute the integer with
a (smaller) boolean and get rid of the FOREIGN KEY constraint and the
CHECK constraint. We still need to take care of the UNIQUEness of
kingship per nation, though. That's where my initially posted code
example comes in.

Also the overhead of being "not king" has to be carried by every man
instead of every nation, as you mentioned. BUT in my case, there are
fewer men than nations - only 0.7 per nation on average. Yeah, more
than half of the "nations" have no men, and no nation has more than
handful. That's where the analogy to "nation" fails and my example is
a bit misleading. I have chosen it because everyone understands the
"highlander problem" without much explanation: there can be only
one. :)

I think, all of the four models make sense under the right
circumstances. 3.) or 4.) are my choices.

I have implemented 3.) a number of times in the past. It works, but
you have to pay attention to the circular references.
Now I have come up with the new 4.) and it works like a charm. So far.
Even more so, as I have _many_ SELECTs (often only including "nation"
and not so often also "man"), few INSERTs and DELETEs, and rare
UPDATEs.
In the operations on "man", >99% of the INSERTs only need to operate
on one tuple, and all of them only need to manipulate the much smaller
"man"-table. As there is more trigger work in 4.) versus more checking
of referential integrity in 3.), I am not really sure, which one is
the winner, though. Overhead for UPDATEs or SELECTs should be roughly
the same as for 3.).
As for SELECTS, 3.) should be slightly faster than 4.), but no real
nation join man on nation.king_id = man.man_id
nation join man USING (nation_id) WHERE king
And storage (in my case): 0.7 x 1 Byte clearly beats 1 x 4 Bytes.
(half of which are NULL though, so use only a bit of the NULL-bitmask)

OK, I may have gotten carried away. Hopefully all of this is of any
use to anyone - or at least of interest to you, Lew.

Regards
Erwin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2007-06-05 02:36:28 Re: what to do when pg_cancel_backend() doesnt work?
Previous Message Carlos H. Reimer 2007-06-05 01:10:43 PostgreSQL abnormally terminating with signal 5