Convert NOT IN sublinks to anti-joins when safe

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Convert NOT IN sublinks to anti-joins when safe
Date: 2026-02-03 07:12:48
Message-ID: CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.

I believe we are now in a much better position to attempt this again.
The planner has accumulated significant infrastructure that makes this
proof straightforward and reliable. Specifically, we can now leverage
the outer-join-aware-Var infrastructure to tell whether a Var comes
from the nullable side of an outer join, and the not-null-attnums hash
table to efficiently check whether a Var is defined NOT NULL. We also
have the expr_is_nonnullable() function that is smart enough to deduce
non-nullability for expressions more complex than simple Vars/Consts.

Attached is a draft patch for this attempt (part of the code is
adapted from an old patch [1] by David and Tom). This patch aims for
a conservative implementation: the goal is not to handle every
theoretical case, but to handle canonical query patterns with minimal
code complexity.

The patch primarily targets patterns like:

SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users);

... and

SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM
banned_users WHERE user_id IS NOT NULL);

This is a very typical syntax for exclusion. In well-modeled
databases, join keys like id and user_id are very likely to be defined
as NOT NULL.

It seems to me that the ROI here is quite positive: the added code
complexity is very low (thanks to the existing infrastructure), while
the benefit is that users writing this typical pattern will finally
get efficient anti-join plans without needing manual rewrites.

(For the outer expressions, we could potentially also use outer query
quals to prove non-nullability. This patch does not attempt to do so.
Implementing this would require passing state down during the
pull_up_sublinks recursion; and given that find_nonnullable_vars can
fail to prove non-nullability in many cases due to the lack of
const-simplification at this stage, I'm not sure whether it is worth
the code complexity. Besides, I haven't fully convinced myself that
doing this does not introduce correctness issues.)

Any thoughts?

[1] https://postgr.es/m/13766.1405037879@sss.pgh.pa.us

- Richard

Attachment Content-Type Size
v1-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch application/octet-stream 53.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-02-03 07:15:57 Re: Remove freelist reference in buf_init.c
Previous Message Kyotaro Horiguchi 2026-02-03 06:49:47 Re: pg_resetwal: Fix wrong directory in log output