Re: Key joins

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: "Arne Roland" <arne(dot)roland(at)malkut(dot)net>, "Anders Granlund" <anders(dot)granlund(dot)0(at)gmail(dot)com>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "Vik Fearing" <vik(at)chouppes(dot)com>
Subject: Re: Key joins
Date: 2026-05-28 22:13:52
Message-ID: 2bb4378a-cf8f-4375-b25b-046384335263@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 28, 2026, at 20:47, Joel Jacobson wrote:
> Hi hackers,
>
> This patch implements a new SQL language feature, that we intent to
> submit as a Change Proposal to the WG 3 SQL committee for the next
> meeting in Stockholm in June. We would greatly appreciate any feedback
> from the community.
>
> Key Joins
> ---------
>
> Mathematically, CROSS JOINs are a cartesian product, INNER JOINs a
> subset of it, and OUTER JOINs potentially null-extend that subset. In
> practice, however, most joins look up additional information along a
> foreign key rather than resembling a cartesian product.
>
> Today a written query does not convey whether a particular JOIN simply
> enriches the referencing side, filters it, or fans it out:
>
> -- both sums silently inflated by the fan-out
> SELECT o.id, SUM(oi.amount), SUM(p.amount)
> FROM orders o
> LEFT JOIN order_items oi ON oi.order_id = o.id
> LEFT JOIN payments p ON p.order_id = o.id
> GROUP BY o.id;
>
> The above example is based on a pgsql-generals thread [1].
>
> We propose a new JOIN syntax that makes it easy to determine locally
> that the immediate join result, before any further steps, just enriches
> the referencing side with information from the referenced side, with
> null-extension for OUTER JOINs. It conveys the author's intent, makes
> the referencing side visually clear, and is enforced at compile time
> against the schema. If we can't prove it, the user gets a compile-time
> error.
>
> Under FOR KEY the same query will not compile:
>
> SELECT o.id, SUM(oi.amount), SUM(p.amount)
> FROM orders o
> LEFT JOIN order_items oi FOR KEY (order_id) -> o (id)
> LEFT JOIN payments p FOR KEY (order_id) -> o (id)
> GROUP BY o.id;
>
> ERROR: key join from referencing relation p to referenced relation o
> cannot be proven
> LINE 7: LEFT JOIN payments AS p FOR KEY (order_id) -> o (id)
> ^
> DETAIL: Referenced columns o (id) are not proven unique. A preceding
> join may duplicate rows from referenced relation o.
>
> Web demo
> --------
>
> The attached Discussion paper has also been published at https://keyjoin.org
> with all examples in the paper runnable in the browser using a patched PGLite.
>
> Patches
> -------
>
> 0001
> The first patch is an attempt to fix a problem discussed in thread [2], where
> DROP FUNCTION can cause issues with concurrent dependency lookups. For key
> joins, this issue also extends to ALTER FUNCTION.
>
> The patch prevents stored expressions from depending on stale function
> OIDs by locking referenced procedures before recording dependencies, and
> by making CREATE OR REPLACE FUNCTION and ALTER FUNCTION take conflicting
> locks before changing pg_proc.
>
> 0002
> The second patch implements the FOR KEY join feature. As this is a
> first prototype, there are definitively things that needs to be
> improved. For example, we would love feedback on our the revalidation
> logic and our dependency tracking approach, that adds a new deptype for
> purpose of tracking the proof facts. Another problem we didn't find a
> perfect solution to, was our need to expand views during parse for proof
> checking and finding constraints.
>
> The logics to compute the facts needed by the proof checker are kind of
> complex, which is partially due to the ambition to not introduce
> overhead to queries not using the new feature.
>
> The patch comes with a massive test suite, that we understand will need
> to be trimmed down to have a chance to be committable.
>
> 0003
> The third patch adds information_schema.view_constraint_usage, that
> shows what constraints a view depend on, due to usage of key joins in
> the view's query. This is also part of the proposal.
>
> Joel Jacobson
> Vik Fearing
> Andreas Karlsson
> Arne Roland
> Anders Granlund
>
> [1] [Avoiding double-counting in aggregates with more than one join?]
> (https://www.postgresql.org/message-id/flat/86b9ec78-925c-1935-bc9d-6bad4ceb1f40(at)illuminatedcomputing(dot)com)
> [2] [RE: Parallel INSERT SELECT take 2]
> (https://www.postgresql.org/message-id/TY4PR01MB17718A4DE63020A9EA5E9CB6594382%40TY4PR01MB17718.jpnprd01.prod.outlook.com)

I noticed cfbot was red; I see I forgot to include these files in patch 0002:
src/test/modules/injection_points/expected/key_join_proof_race_record_proc_dep.out
src/test/modules/injection_points/specs/key_join_proof_race_record_proc_dep.spec

Fixed in new version attached.

/Joel

Attachment Content-Type Size
v2-0003-Add-information_schema.view_constraint_usage.patch.gz application/x-gzip 2.9 KB
v2-0002-Implement-FOR-KEY-join-support.patch.gz application/x-gzip 171.8 KB
v2-0001-Lock-procedures-before-recording-dependencies.patch.gz application/x-gzip 3.3 KB

In response to

  • Key joins at 2026-05-28 18:47:50 from Joel Jacobson

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2026-05-28 22:33:33 Re: [PATCH] Improving index selection for logical replication apply with replica identity full
Previous Message Tomas Vondra 2026-05-28 22:11:32 Re: should we have a fast-path planning for OLTP starjoins?