| From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
|---|---|
| To: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Convert NOT IN sublinks to anti-joins when safe |
| Date: | 2026-02-03 07:12:48 |
| Message-ID: | CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.
I believe we are now in a much better position to attempt this again.
The planner has accumulated significant infrastructure that makes this
proof straightforward and reliable. Specifically, we can now leverage
the outer-join-aware-Var infrastructure to tell whether a Var comes
from the nullable side of an outer join, and the not-null-attnums hash
table to efficiently check whether a Var is defined NOT NULL. We also
have the expr_is_nonnullable() function that is smart enough to deduce
non-nullability for expressions more complex than simple Vars/Consts.
Attached is a draft patch for this attempt (part of the code is
adapted from an old patch [1] by David and Tom). This patch aims for
a conservative implementation: the goal is not to handle every
theoretical case, but to handle canonical query patterns with minimal
code complexity.
The patch primarily targets patterns like:
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users);
... and
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM
banned_users WHERE user_id IS NOT NULL);
This is a very typical syntax for exclusion. In well-modeled
databases, join keys like id and user_id are very likely to be defined
as NOT NULL.
It seems to me that the ROI here is quite positive: the added code
complexity is very low (thanks to the existing infrastructure), while
the benefit is that users writing this typical pattern will finally
get efficient anti-join plans without needing manual rewrites.
(For the outer expressions, we could potentially also use outer query
quals to prove non-nullability. This patch does not attempt to do so.
Implementing this would require passing state down during the
pull_up_sublinks recursion; and given that find_nonnullable_vars can
fail to prove non-nullability in many cases due to the lack of
const-simplification at this stage, I'm not sure whether it is worth
the code complexity. Besides, I haven't fully convinced myself that
doing this does not introduce correctness issues.)
Any thoughts?
[1] https://postgr.es/m/13766.1405037879@sss.pgh.pa.us
- Richard
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch | application/octet-stream | 53.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-02-03 07:15:57 | Re: Remove freelist reference in buf_init.c |
| Previous Message | Kyotaro Horiguchi | 2026-02-03 06:49:47 | Re: pg_resetwal: Fix wrong directory in log output |