Re: Remove inner joins based on foreign keys

From: Ilmar Yunusov <tanswis42(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: Remove inner joins based on foreign keys
Date: 2026-06-05 14:07:53
Message-ID: 178066847358.594057.6986804936448401043.pgcf@coridan.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Hi,

I looked at the v5 series from Richard's 2026-05-25 message, focusing on
apply/build status and on SQL-level cases where FK-based inner join removal
could change results.

I used both v5 patches:

v5-0001-Remove-inner-joins-based-on-foreign-keys.patch
v5-0002-Suppress-FK-based-inner-join-removal-during-the-t.patch

on origin/master at:

4cb2a9863d89b320f37eb1bd76822f6f65e59311

The series applies cleanly with git am, and git diff --check reports no
issues.

I first built with:

./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
make -s -j8
make -s install

make -C src/test/regress check

passed; all regression tests passed.

I also built a separate cassert/debug tree with:

./configure --prefix="$PWD/pg-install" --without-readline --without-zlib
--without-icu --enable-cassert --enable-debug 'CFLAGS=-O0 -g'
make -s -j8
make -s install

and ran:

make -C src/test/regress check

That also passed; all regression tests passed.

For manual checks, I used a small SQL corpus that compares query results with
enable_fk_inner_join_removal off and on, and inspects plans for the cases where
the referenced relation should or should not be removed.

The result comparisons passed for:

simple NOT NULL FK
nullable FK, where the plan added IS NOT NULL on the referencing column
multi-column nullable FK, where the plan added IS NOT NULL on both columns
NOT VALID FK
DEFERRABLE FK
an outer-join-adjacent case where removal still looked valid
referenced-side RLS
The plans also looked as expected in the rejection cases I checked:

NOT VALID and DEFERRABLE FKs kept the referenced relation
a referenced-side filter kept the referenced relation
TABLESAMPLE on the referenced relation kept the referenced relation
an outer join ON clause that referenced the FK referenced relation kept it
referenced-side RLS kept the parent scan, including the RLS filter
For the RLS case, the plan as a non-owner role included:

Seq Scan on rls_parent p
Filter: visible

and the result comparison against enable_fk_inner_join_removal = off passed.

I also checked the current lock-based trigger-gap guard in v5 0002.

With plan_cache_mode = force_generic_plan, the first EXPLAIN EXECUTE of a
prepared SELECT used the FK-removed generic plan:

Seq Scan on gap_child c
Filter: (id > $1)

After an INSERT on the child table held RowExclusiveLock in the same
transaction, EXPLAIN EXECUTE used a custom plan that retained the parent join:

Nested Loop
Join Filter: (p.id = c.pid)
-> Seq Scan on gap_child c
Filter: (id > 0)
-> Materialize
-> Seq Scan on gap_parent p

pg_prepared_statements showed generic_plans = 1 and custom_plans increasing
from 0 to 1 for that prepared statement.

I also tested a cached VOLATILE plpgsql function called from DELETE RETURNING
during the RI trigger-gap window. The function's SELECT was first executed
outside the gap, then the DELETE RETURNING call ran after the parent row had
been deleted but before the ON DELETE CASCADE action had removed the child
row. The function returned the joined count I expected from the non-removed
join, not the count that would have resulted from blindly reusing the
FK-removed plan.

I did not find a wrong-result case in this pass.

I am not trying to settle the broader choice between the lock-based predicate
and the other approaches described in the v5 email. I only checked the current
v5 lock-based implementation against the cases above. Based on those checks,
the predicate behaved as intended in the prepared-plan and VOLATILE function
trigger-gap cases I tried.

Two small test-coverage thoughts:

The referenced-side RLS case might be worth adding to the regression tests,
since FK existence does not imply that the current user can see the parent
row.

If the lock-based predicate remains the selected approach, a cached
VOLATILE plpgsql function inside DELETE RETURNING may be a useful regression
test for the trigger-gap path, because it exercises a cached plan outside
the simple PREPARE/EXECUTE path.

I did not run installcheck-world, and I did not do an exhaustive planner
counterexample search.

Regards,
Ilmar Yunusov

The new status of this patch is: Waiting on Author

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-06-05 14:08:29 Re: Fix domain fast defaults on empty tables
Previous Message Bruce Momjian 2026-06-05 13:55:27 Re: First draft of PG 19 release notes