Re: Allow foreign keys to reference a superset of unique columns

From: Wolfgang Walther <walther(at)technowledgy(dot)de>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kaiting Chen <ktchen14(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, hellopfm(at)gmail(dot)com
Subject: Re: Allow foreign keys to reference a superset of unique columns
Date: 2022-09-25 08:49:15
Message-ID: 2ef8dbe1-4e2e-25af-218e-0b5efae8f649@technowledgy.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

James Coleman:
> If I'm following properly this sounds like an overengineered EAV
> schema, and neither of those things inspires me to think "this is a
> use case I want to support".
>
> That being said, I know that sometimes examples that have been
> abstracted enough to share aren't always the best, so perhaps there's
> something underlying this that's a more valuable example.

Most my use-cases are slightly denormalized and I was looking for an
example that didn't require those kind of FKS only because of the
denormalization. So that's why it might have been a bit artifical or
abstracted too much.

Take another example: I deal with multi-tenancy systems, for which I
want to use RLS to separate the data between tenants:

CREATE TABLE tenants (tenant INT PRIMARY KEY);

Each tenant can create multiple users and groups:

CREATE TABLE users (
"user" INT PRIMARY KEY,
tenant INT NOT NULL REFERENCES tenants
);

CREATE TABLLE groups (
"group" INT PRIMARY KEY,
tenant INT NOT NULL REFERENCES tenants
);

Users can be members of groups. The simple approach would be:

CREATE TABLE members (
PRIMARY KEY ("user", "group"),
"user" INT REFERENCES users,
"group" INT REFERENCES groups
);

But this falls short in two aspects:
- To make RLS policies simpler to write and quicker to execute, I want
to add "tenant" columns to **all** other tables. A slightly denormalized
schema for efficiency.
- The schema above does not ensure that users can only be members in
groups of the same tenant. Our business model requires to separate
tenants cleanly, but as written above, cross-tenant memberships would be
allowed.

In comes the "tenant" column which solves both of this:

CREATE TABLE members (
PRIMARY KEY ("user", "group"),
tenant INT REFERENCES tenants,
"user" INT,
"group" INT,
FOREIGN KEY ("user", tenant) REFERENCES users ("user", tenant),
FOREIGN KEY ("group", tenant) REFERENCES groups ("group", tenant)
);

This is not possible to do right now, without adding more UNIQUE
constraints to the users and groups tables - on a superset of already
unique columns.

>> bar.y is a little bit like a generated value in that sense, it should
>> always match foo.b. I think it would be great, if we could actually go a
>> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
>> exists, I would like to set bar.y automatically to the new foo.b.
>> Otherwise those kind of updates always have to either query foo before,
>> or add a trigger to do the same.
>
> Isn't this actually contradictory to the behavior you currently have
> with a multi-column foreign key? In the example above then an update
> to bar.x is going to update the rows in foo that match bar.x = foo.a
> and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
> new values.

No, I think there was a misunderstanding. An update to bar should not
update rows in foo. An update to bar.x should update bar.y implicitly,
to match the new value of foo.b.

> You seem to be suggesting that instead it should look for
> other rows that already match the *new value* of only one of the
> columns in the constraint.

Yes. I think basically what I'm suggesting is, that for an FK to a
superset of unique columns, all the FK-logic should still be done on the
already unique set of columns only - and then the additional columns
should be mirrored into the referencing table. The referencing table can
then put additional constraints on this column. In the members example
above, this additional constraint is the fact that the tenant column
can't be filled with two different values for the users and groups FKs.
But this could also be a CHECK constraint to allow FKs only to a subset
of rows in the target table:

CREATE TYPE foo_type AS ENUM ('A', 'B', 'C');

CREATE TABLE foo (
f INT PRIMARY KEY,
type foo_type
);

CREATE TABLE bar (
b INT PRIMARY KEY,
f INT,
ftype foo_type CHECK (ftype <> 'C'),
FOREIGN KEY (f, ftype) REFERENCES foo (f, type);
);

In this example, the additional ftype column is just used to enforce
that bar can only reference rows with type A or B, but not C. Assume:

INSERT INTO foo VALUES (1, 'A'), (2, 'B'), (3, 'C');

In this case, it would be nice to be able to do the following, i.e.
derive the value for bar.ftype automatically:

INSERT INTO bar (b, f) VALUES (10, 1); -- bar.ftype is then 'A'
UPDATE bar SET f = 2 WHERE b = 10; -- bar.ftype is then 'B'

And it would throw errors in the following cases, because the
automatically derived value fails the CHECK constraint:

INSERT INTO bar (b, f) VALUES (20, 3);
UPDATE bar SET f = 3 WHERE b = 10;

Note: This "automatically derived columns" extension would be a separate
feature. Really nice to have, but the above mentioned FKs to supersets
of unique columns would be very valuable without it already.

Best

Wolfgang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Wolfgang Walther 2022-09-25 09:08:10 Re: has_privs_of_role vs. is_member_of_role, redux
Previous Message Julien Rouhaud 2022-09-25 06:56:03 Re: Schema variables - new implementation for Postgres 15