| From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Cc: | "Arne Roland" <arne(dot)roland(at)malkut(dot)net>, "Anders Granlund" <anders(dot)granlund(dot)0(at)gmail(dot)com>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "Vik Fearing" <vik(at)chouppes(dot)com> |
| Subject: | Re: Key joins |
| Date: | 2026-05-28 22:13:52 |
| Message-ID: | 2bb4378a-cf8f-4375-b25b-046384335263@app.fastmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, May 28, 2026, at 20:47, Joel Jacobson wrote:
> Hi hackers,
>
> This patch implements a new SQL language feature, that we intent to
> submit as a Change Proposal to the WG 3 SQL committee for the next
> meeting in Stockholm in June. We would greatly appreciate any feedback
> from the community.
>
> Key Joins
> ---------
>
> Mathematically, CROSS JOINs are a cartesian product, INNER JOINs a
> subset of it, and OUTER JOINs potentially null-extend that subset. In
> practice, however, most joins look up additional information along a
> foreign key rather than resembling a cartesian product.
>
> Today a written query does not convey whether a particular JOIN simply
> enriches the referencing side, filters it, or fans it out:
>
> -- both sums silently inflated by the fan-out
> SELECT o.id, SUM(oi.amount), SUM(p.amount)
> FROM orders o
> LEFT JOIN order_items oi ON oi.order_id = o.id
> LEFT JOIN payments p ON p.order_id = o.id
> GROUP BY o.id;
>
> The above example is based on a pgsql-generals thread [1].
>
> We propose a new JOIN syntax that makes it easy to determine locally
> that the immediate join result, before any further steps, just enriches
> the referencing side with information from the referenced side, with
> null-extension for OUTER JOINs. It conveys the author's intent, makes
> the referencing side visually clear, and is enforced at compile time
> against the schema. If we can't prove it, the user gets a compile-time
> error.
>
> Under FOR KEY the same query will not compile:
>
> SELECT o.id, SUM(oi.amount), SUM(p.amount)
> FROM orders o
> LEFT JOIN order_items oi FOR KEY (order_id) -> o (id)
> LEFT JOIN payments p FOR KEY (order_id) -> o (id)
> GROUP BY o.id;
>
> ERROR: key join from referencing relation p to referenced relation o
> cannot be proven
> LINE 7: LEFT JOIN payments AS p FOR KEY (order_id) -> o (id)
> ^
> DETAIL: Referenced columns o (id) are not proven unique. A preceding
> join may duplicate rows from referenced relation o.
>
> Web demo
> --------
>
> The attached Discussion paper has also been published at https://keyjoin.org
> with all examples in the paper runnable in the browser using a patched PGLite.
>
> Patches
> -------
>
> 0001
> The first patch is an attempt to fix a problem discussed in thread [2], where
> DROP FUNCTION can cause issues with concurrent dependency lookups. For key
> joins, this issue also extends to ALTER FUNCTION.
>
> The patch prevents stored expressions from depending on stale function
> OIDs by locking referenced procedures before recording dependencies, and
> by making CREATE OR REPLACE FUNCTION and ALTER FUNCTION take conflicting
> locks before changing pg_proc.
>
> 0002
> The second patch implements the FOR KEY join feature. As this is a
> first prototype, there are definitively things that needs to be
> improved. For example, we would love feedback on our the revalidation
> logic and our dependency tracking approach, that adds a new deptype for
> purpose of tracking the proof facts. Another problem we didn't find a
> perfect solution to, was our need to expand views during parse for proof
> checking and finding constraints.
>
> The logics to compute the facts needed by the proof checker are kind of
> complex, which is partially due to the ambition to not introduce
> overhead to queries not using the new feature.
>
> The patch comes with a massive test suite, that we understand will need
> to be trimmed down to have a chance to be committable.
>
> 0003
> The third patch adds information_schema.view_constraint_usage, that
> shows what constraints a view depend on, due to usage of key joins in
> the view's query. This is also part of the proposal.
>
> Joel Jacobson
> Vik Fearing
> Andreas Karlsson
> Arne Roland
> Anders Granlund
>
> [1] [Avoiding double-counting in aggregates with more than one join?]
> (https://www.postgresql.org/message-id/flat/86b9ec78-925c-1935-bc9d-6bad4ceb1f40(at)illuminatedcomputing(dot)com)
> [2] [RE: Parallel INSERT SELECT take 2]
> (https://www.postgresql.org/message-id/TY4PR01MB17718A4DE63020A9EA5E9CB6594382%40TY4PR01MB17718.jpnprd01.prod.outlook.com)
I noticed cfbot was red; I see I forgot to include these files in patch 0002:
src/test/modules/injection_points/expected/key_join_proof_race_record_proc_dep.out
src/test/modules/injection_points/specs/key_join_proof_race_record_proc_dep.spec
Fixed in new version attached.
/Joel
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0003-Add-information_schema.view_constraint_usage.patch.gz | application/x-gzip | 2.9 KB |
| v2-0002-Implement-FOR-KEY-join-support.patch.gz | application/x-gzip | 171.8 KB |
| v2-0001-Lock-procedures-before-recording-dependencies.patch.gz | application/x-gzip | 3.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2026-05-28 22:33:33 | Re: [PATCH] Improving index selection for logical replication apply with replica identity full |
| Previous Message | Tomas Vondra | 2026-05-28 22:11:32 | Re: should we have a fast-path planning for OLTP starjoins? |