Re: Remove inner joins based on foreign keys

From: Tender Wang <tndrwang(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Remove inner joins based on foreign keys
Date: 2026-04-30 07:50:04
Message-ID: CAHewXNk=iH75y43SWGVdmnfJAbUdzs_BZKyCQX7pN2gx+zKx5A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Richard,

Richard Guo <guofenglinux(at)gmail(dot)com> 于2026年4月28日周二 18:09写道:
>
> On Wed, Apr 1, 2026 at 6:45 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > Please see the v2 patch for the implementation details.

Thanks for this optimization. I look through this patch, and I have
two questions about this implementation.
This is my test case:
CREATE TABLE users (id int primary key, name text);
CREATE TABLE orders (id int primary key, user_id int references
users(id), amount int);
create table nation (id int, name text);

postgres=# explain select n.* from nation n join (orders o join users
u on o.user_id = u.id) on n.id = o.id;
QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=55.78..81.82 rows=1264 width=36)
Hash Cond: (n.id = o.id)
-> Seq Scan on nation n (cost=0.00..22.70 rows=1270 width=36)
-> Hash (cost=30.40..30.40 rows=2030 width=8)
-> Seq Scan on orders o (cost=0.00..30.40 rows=2030 width=8)
Filter: (user_id IS NOT NULL)
(6 rows)

The above plan is what we want. The inner join between orders and
users is removed thanks to the FK constraint.
But if I change the most left-side inner join to an outer join, the
plan doesn't seem to be what we want. Maybe I'm wrong.
postgres=# explain select n.* from nation n left join (orders o join
users u on o.user_id = u.id) on n.id = o.id;
QUERY PLAN
-----------------------------------------------------------------------------------
Hash Left Join (cost=99.85..140.01 rows=1270 width=36)
Hash Cond: (n.id = o.id)
-> Seq Scan on nation n (cost=0.00..22.70 rows=1270 width=36)
-> Hash (cost=74.35..74.35 rows=2040 width=4)
-> Hash Join (cost=38.58..74.35 rows=2040 width=4)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders o (cost=0.00..30.40 rows=2040 width=8)
-> Hash (cost=22.70..22.70 rows=1270 width=4)
-> Seq Scan on users u (cost=0.00..22.70
rows=1270 width=4)
(9 rows)

The inner join between orders and users is not removed. It appears
this can be removed in this query. This is my first question.

The second question may not be related to this $SUBJECT.
I did some research about why the inner join cannot be removed in the
second query.
In the inner_join_is_removable(), we have the following check:
...
/*
* If the referenced relation has any restriction clauses or non-equality
* join clauses, they act as explicit filters. Since we cannot perform
* variable substitution to rewrite these clauses, we must abort.
*/
if (ref_rel->baserestrictinfo || ref_rel->joininfo)
return false;
...
The ref_rel is the users relation, and ref_rel->baserestrictinfo is
NIL, but the ref_rel->joininfo is not NIL.
In this query, the ref_rel->joininfo is (n.id = o.id), because it is a
left join, so n.id may not equal o.id.
So we have "non-equality join clauses" in the above comments.
My question is that the (n.id = o.id) clause seems not related to
users. Why does the planner add it to the users' RelOptInfo's
joininfo?

I only know that the thing happens in the
reconsider_outer_join_clauses(), the restrictinfo->required_relids
contains {1,2,3}, where the rtindex = 3 is the users relation.

--
Thanks,
Tender Wang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mats Kindahl 2026-04-30 08:19:21 pg_rewind does not rewind diverging timelines
Previous Message Dilip Kumar 2026-04-30 07:37:19 Re: Include schema-qualified names in publication error messages.