Re: Allow foreign keys to reference a superset of unique columns

From: Wolfgang Walther <walther(at)technowledgy(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kaiting Chen <ktchen14(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, hellopfm(at)gmail(dot)com
Subject: Re: Allow foreign keys to reference a superset of unique columns
Date: 2022-09-02 09:42:25
Message-ID: 85a79730-5778-a7ed-5ca4-cadf6e45e44d@technowledgy.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kaiting Chen:
> I'd like to propose a change to PostgreSQL to allow the creation of a foreign
> key constraint referencing a superset of uniquely constrained columns.

+1

Tom Lane:
> TBH, I think this is a fundamentally bad idea and should be rejected
> outright. It fuzzes the semantics of the FK relationship, and I'm
> not convinced that there are legitimate use-cases. Your example
> schema could easily be dismissed as bad design that should be done
> some other way.

I had to add quite a few unique constraints on a superset of already
uniquely constrained columns in the past, just to be able to support FKs
to those columns. I think those cases most often come up when dealing
with slightly denormalized schemas, e.g. for efficiency.

One other use-case I had recently, was along the followling lines, in
abstract terms:

CREATE TABLE classes (class INT PRIMARY KEY, ...);

CREATE TABLE instances (
instance INT PRIMARY KEY,
class INT REFERENCES classes,
...
);

Think about classes and instances as in OOP. So the table classes
contains some definitions for different types of object and the table
instances realizes them into concrete objects.

Now, assume you have some property of a class than is best modeled as a
table like this:

CREATE TABLE classes_prop (
property INT PRIMARY KEY,
class INT REFERNECES classes,
...
);

Now, assume you need to store data for each of those classes_prop rows
for each instance. You'd do the following:

CREATE TABLE instances_prop (
instance INT REFERENCES instances,
property INT REFERENCES classes_prop,
...
);

However, this does not ensure that the instance and the property you're
referencing in instances_prop are actually from the same class, so you
add a class column:

CREATE TABLE instances_prop (
instance INT,
class INT,
property INT,
FOREIGN KEY (instance, class) REFERENCES instances,
FOREIGN KEY (property, class) REFERENCES classes_prop,
...
);

But this won't work, without creating some UNIQUE constraints on those
supersets of the PK column first.

> For one example of where the semantics get fuzzy, it's not
> very clear how the extra-baggage columns ought to participate in
> CASCADE updates. Currently, if we have
> CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> then an update that changes only foo.b doesn't need to update
> referencing tables, and I think we even have optimizations that
> assume that if no unique-key columns are touched then RI checks
> need not be made. But if you did
> CREATE TABLE bar (x integer, y integer,
> FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
> then perhaps you expect bar.y to be updated ... or maybe you don't?

In all use-cases I had so far, I would expect bar.y to be updated, too.

I think it would not even be possible to NOT update bar.y, because the
FK would then not match anymore. foo.a is the PK, so the value in bar.x
already forces bar.y to be the same as foo.b at all times.

bar.y is a little bit like a generated value in that sense, it should
always match foo.b. I think it would be great, if we could actually go a
step further, too: On an update to bar.x to a new value, if foo.a=bar.x
exists, I would like to set bar.y automatically to the new foo.b.
Otherwise those kind of updates always have to either query foo before,
or add a trigger to do the same.

In the classes/instances example above, when updating
instances_prop.property to a new value, instances_prop.class would be
updated automatically to match classes_prop.class. This would fail, when
the class is different than the class required by the FK to instances,
though, providing exactly the safe-guard that this constraint was
supposed to provide, without incurring additional overhead in update
statements.

In the foo/bar example above, which is just a bit of denormalization,
this automatic update would also be helpful - because rejecting the
update on the grounds that the columns don't match doesn't make sense here.

> Another example is that I think the idea is only well-defined when
> the subset column(s) are a primary key, or at least all marked NOT NULL.
> Otherwise they're not as unique as you're claiming.

I fail to see why. My understanding is that rows with NULL values in the
referenced table can't participate in FK matches anyway, because both
MATCH SIMPLE and MATCH FULL wouldn't require a match when any/all of the
columns in the referencing table are NULL. MATCH PARTIAL is not
implemented, so I can't tell whether the semantics would be different there.

I'm not sure whether a FK on a superset of unique columns would be
useful with MATCH SIMPLE. Maybe it could be forced to be MATCH FULL, if
MATCH SIMPLE is indeed not well-defined.

> It's also unclear to me how this ought to interact with the
> information_schema views concerning foreign keys. We generally
> feel that we don't want to present any non-SQL-compatible data
> in information_schema, for fear that it will confuse applications
> that expect to see SQL-spec behavior there. So do we leave such
> FKs out of the views altogether, or show only the columns involving
> the associated unique constraint? Neither answer seems pleasant.

Instead of tweaking FKs, maybe it would be possible to define a UNIQUE
constraint re-using an existing index that guarantees uniqueness on a
subset of columns already? This would allow to create those FK
relationships by creating another unique constraint - without the
overhead of creating yet another index.

So roughly something like this:

ALTER TABLE foo ADD UNIQUE (a, b) USING INDEX foo_pk;

This should give a consistent output for information_schema views?

Best

Wolfgang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-09-02 09:44:08 Clarify restriction on partitioned tables primary key / unique indexes
Previous Message Dong Wook Lee 2022-09-02 09:34:58 Re: add test: pg_rowlocks extension