Foreign key joins revisited

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Foreign key joins revisited
Date: 2021-12-25 20:55:51
Message-ID: 5b238fdd-16a2-4b63-a347-c63813ebccb0@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've revisited the idea to somehow use foreign keys to do joins,
in the special but common case when joining on columns that exactly match a foreign key.

The idea is to add a new ternary operator, which would be allowed only in the FROM clause.

It would take three operands:

1) referencing_table_alias
2) foreign_key_constraint_name
3) referenced_table_alias

POSSIBLE BENEFITS

* Eliminate risk of joining on the wrong columns
Although probably an uncommon class of bugs, a join can be made on the wrong columns, which could go undetected if the desired row is included by coincidence, such as if the test environment might only contain a single row in some table, and the join condition happened to be always true.
By joining using the foreign key, it can be verified at compile time, that the referenced_table_alias is actually an alias for the table referenced by the foreign key. If some other alias would be given, an error would be thrown, to avoid failure.

* Conciser syntax
In a traditional join, you have to explicitly state all columns for the referencing and referenced table.
I think writing joins feels like you are repeating the same table aliases and column names over and over again, all the time.
This is especially true for multiple-column joins.
This is somewhat addressed by the USING join form, but USING has other drawbacks, why I tend to avoid it except for one-off queries.
When having to use fully-qualified table aliases, that adds even further to the verboseness.

* Makes abnormal joins stand out
If joining on something else than foreign key columns, or some inequality expression, such joins will continue to be written in the traditional way, and will therefore stand out and be more visible, if all other foreign key-based joins are written using the new syntax.
When reading SQL queries, I think this would be a great improvement, since the boring normal joins on foreign keys could be given less attention, and focus could instead be made on making sure you understand the more complex joins.

* Explicit direction of the join
In a traditional join on foreign key columns, it's not possible to derive if the join is a one-to-many or many-to-one join, by just looking at the SQL code itself. One must also know/inspect the data model or make assumptions based on the naming of columns and tables. This is perhaps the least interesting benefit though, since good naming makes the direction quite obvious anyway. But I think it at least reduces the total cognitive load of reading a SQL query.

POSSIBLE DRAWBACKS

* Another thing users would have to learn
* Would require changes to the SQL standard, i.e. SQL committee work
* Introduces a hard dependency on foreign keys, they cannot be dropped

SYNTAX

Syntax is hard, but here is a proposal to start the discussion:

from_item join_type from_item WITH [referencing_table_alias]->[foreign_key_constraint_name] = [referenced_table_alias] [ AS join_using_alias ]

EXAMPLE

To experiment with the idea, I wanted to find some real-world queries written by others,
to see how such SQL queries would look like, using traditional joins vs foreign key joins.

I came up with the idea of searching Github for "LEFT JOIN", since just searching for "JOIN" would match a lot of non-SQL code as well.
Here is one of the first examples I found, a query below from the Grafana project [1]
[1] https://github.com/grafana/grafana/blob/main/pkg/services/accesscontrol/database/resource_permissions.go

SELECT
p.*,
? AS resource_id,
ur.user_id AS user_id,
u.login AS user_login,
u.email AS user_email,
tr.team_id AS team_id,
t.name AS team,
t.email AS team_email,
r.name as role_name
FROM permission p
LEFT JOIN role r ON p.role_id = r.id
LEFT JOIN team_role tr ON r.id = tr.role_id
LEFT JOIN team t ON tr.team_id = t.id
LEFT JOIN user_role ur ON r.id = ur.role_id
LEFT JOIN user u ON ur.user_id = u.id
WHERE p.id = ?

Here is how the FROM clause could be rewritten:

FROM permission p
LEFT JOIN role r WITH p->permission_role_id_fkey = r
LEFT JOIN team_role tr WITH tr->team_role_role_id_fkey = r
LEFT JOIN team t WITH tr->team_role_team_id_fkey = t
LEFT JOIN user_role ur WITH ur->user_role_role_id_fkey = r
LEFT JOIN "user" u WITH ur->user_role_user_id_fkey = u
WHERE p.id = 1;

In PostgreSQL, the foreign keys could also be given shorter names, since they only need to be unique per table and not per namespace. I think a nice convention is to give the foreign keys the same name as the referenced table, except if the same table is referenced multiple times or is self-referenced.

Rewriting our example, using such naming convention for the foreign keys:

FROM permission p
LEFT JOIN role r WITH p->role = r
LEFT JOIN team_role tr WITH tr->role = r
LEFT JOIN team t WITH tr->team = t
LEFT JOIN user_role ur WITH ur->role = r
LEFT JOIN "user" u WITH ur->user = u
WHERE p.id = 1;

A better example to illustrate how conciseness is improved, would be one with lots of multi-column joins.
Please feel free to share better query examples to evaluate.

I cannot stop thinking about this idea, I really think it would greatly improve SQL as a language.
Foreign keys feels like such an underused valuable potential resource!
If someone can convince me this is a bad idea, that would help me forget about all of this,
so I would greatly appreciate your thoughts, no matter how negative or positive.

Thank you for digesting.

/Joel

PS.

To readers who might remember the old flawed version of this new idea:

In the old proposal, the third operand (referenced_table_alias) was missing.
There wasn't a way of specifying what table alias the join was supposed to be made against.
It was assumed the referenced table was always the one being joined in,
which is not always the case, since the referenced table
might already be in scope, and it's instead the referencing table which is being joined in.

Another problem with the old idea was you were forced to write the joins in a the same order
as the foreign keys, which often resulted in an awkward join order.

These two problems have now been solved with this new proposal.
Perhaps new problems have been introduced though?

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2021-12-25 21:06:44 Re: Foreign key joins revisited
Previous Message Pavel Stehule 2021-12-25 17:20:36 Re: Schema variables - new implementation for Postgres 15