Proposed feature: Selective Foreign Keys

From: Tom Dunstan <pgsql(at)tomd(dot)cc>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposed feature: Selective Foreign Keys
Date: 2013-11-28 01:15:04
Message-ID: 85A0355E-E76C-4B2D-8FB6-7353CA537A3E@tomd.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all!

The Problem
-----------------
One case that traditional SQL doesn't handle very well is when you have a child entity which can be attached to a number of different parent entities. Examples might be comments, tags or file attachments - we might have 20 different entities in the system that we would like our users to be able add comments to, but the existing solutions for mapping this all have downsides.

Existing solution 1: Join tables ahoy
If I can have a list of comments on every other object in the system, and I want to have referrential integrity, then the obvious thing to do is create a join table between each entity and the comments table.
Pros:
- Straight forward, traditional object-with-collection-of-child SQL structure
Cons:
- If a parent object gets deleted here, we can't use foreign keys to delete e.g. a child comment, so we'll have to either explicitly do it as part of our delete logic or have a cleanup process to catch orphans. Or do a dance with delete triggers on the join tables deleting the comment.
- For n entities requiring comments in the system, we need n join tables.If we want both comments and e.g. tags and likes on all of our entities, we now have 3n join tables for what should be some relatively self-contained on-the-side data - this is could be more tables than the entire rest of the system
- It's difficult to create any kind of self-contained component for building applications in this scenario, as it will need to know about every other entity in the system, or be able to poke around inside whatever ORM or data access system that you have to work out what join tables it needs when running queries.

Existing solution 2: Enter the matrix
Instead of having n join tables, let's just mash them all together, with a column per parent object, and a check constraint to force exactly one of those columns to be set.
Pros:
- Less bloat in the number of tables
Cons:
- Doesn't solve orphan problem
- Addition of a new entity which needs comments and we now have to add another column onto it, potentially rewriting the whole thing
- Ugly

Existing solution 3: Embed the matrix
Kinda like the dependency matrix table, except that all the columns referencing potential parent objects we put into the comment table instead.
Pros:
- Everything contained in column table
- No orphans, since cascaded deletes will now delete the actual comment
Cons:
- Comment table now has references to every single type that it may be attached to
- Addition of a new entity and we probably have to rewrite the comment table now

Existing solution 4: Abandon ye all referential integrity
Have a column indicating parent type and another one for the id. In the case of comments this would be directly on the comment table itself. In the case of something like tags that we might expect to be shared between entities, it would be in a single join table.
Pros:
- Pretty self-contained
- Data model which has neither lots of empty columns or lots of tables
- Can make new entities "commentable" without rewriting anything
- Because it's self-contained, can build application components that don't need to know much about the rest of your system. For example this is the approach that the grails taggable and commentable plugins take.
Cons:
- No referential integrity, since we can't have a single column pointing to different tables with existing foreign key infrastructure
- Since there's no real db support for doing things this way, existing ORMs etc don't really know how use a single column to join against multiple different tables based on a discriminator or 'switch' column.

Existing solution 5: Everything's a thing
Make your entity hierarchy have a base level object which can have comments attached, and then everything that you need to be "commentable" has to extend that. You can do that in an ORM, or with table inheritance in the database.
Pros:
- Single top-level thing to hang your data on
Cons:
- You've polluted your object hierarchy just to hang some stuff off of the end of it rather than it being driven by behaviours
- You're going to be paying a performance penalty - everything that extends that base level object will now join against it incessantly, and you now have a global id sequence or whatever that you may not want.

Basically none of the above handle the situation very well. The cleanest is solution 4, but lack of RI sucks.

Feature Proposal: Selective foreign keys.
-------------------------------------------------
Allow foreign keys to have where clauses. The above comment example using solution 4 might then look like then following:

CREATE TABLE comment as (
id bigserial primary key,
content text not null,
parent_entity regclass not null,
parent_id int8
);
ALTER TABLE comment ADD CONSTRAINT comment_blog_fk FOREIGN KEY (parent_id) REFERENCES blog(id) WHERE (parent_entity = ‘blog');
ALTER TABLE comment ADD CONSTRAINT comment_event_fk FOREIGN KEY (parent_id) REFERENCES event(id) WHERE (parent_entity = ‘event');

At this point, the following things should work:
insert into blog(id, title, content) values (10, 'i hate mondays', 'so sad');
insert into event(id, title, location, date) values (20, 'my birthday party', 'local pub', now());
insert into comment(content, parent_entity, parent_id) values ('me too', 'blog', 10);
insert into comment(content, parent_entity, parent_id) values ('I love that pub', 'event', 20);

But the following would barf:
insert into comment(content, parent_entity, parent_id) values ('bad comment', 'blog', 20);
ERROR: insert or update on table "comment" violates foreign key constraint "comment_blog_fk"
Detail: Key (parent_id)=(20) is not present in table "blog".:

Patch
-----
I have attached a proof-of-concept patch for this feature.
What it currently does:
- Allows where clause on FK declaration
- Validates and stores the expression into pg_constraint a la check constraints, using the same validity checks
- Implements logic on insertion to FK table, and update / delete on PK tables as expected.
- Regression tests handling all of the above

Things I know are still to be done, if there is consensus that this feature is worth having:
- Handle update of columns affecting the selection expression. Currently the RI trigger doesn’t fire if the FK column isn’t modified. I’m not sure where the logic for this is implemented.
- pg_dump support
- Update some comments in a few places referring to conbin/consrc columns of pg_constraint as relevant to check constraints only
- Documentation
- There are probably a few scenarios that I’m not testing in the regression tests that I should be.

Implementation Comments
----------------------------
I borrowed the exclusion constraint where clause in the grammar, which results in the where expression needing parentheses. With a bit more work this could possibly be removed, but I’m not too upset about it tbh.

Currently I’m creating a new executor for each call of RI_FKey_check when there is a where clause, and doing other bits of work in there. It’s not 100% clear to me how expensive doing that stuff in there is, and whether it’s worth it or possible to cache some of it.

Stuff added in tablecmds.c was copied from similar stuff elsewhere, but honestly without a huge amount of understanding. Don’t know if I could do without some of it.

Comments?

Cheers

Tom

PS Thanks to Noah Misch who answered a question on the list a while ago which got me over a hump, and to the authors of the wiki pages around submitting a patch.

Attachment Content-Type Size
selective-foreign-keys-v1.patch application/octet-stream 30.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-11-28 01:17:09 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Peter Eisentraut 2013-11-28 01:01:23 Re: doPickSplit stack buffer overflow in XLogInsert?