| From: | Zhang Mingli <zmlpostgres(at)gmail(dot)com> |
|---|---|
| To: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com> |
| Subject: | Re: Convert NOT IN sublinks to anti-joins when safe |
| Date: | 2026-02-04 04:49:43 |
| Message-ID: | 3b45ceea-016e-475a-a2c0-321d52f4d7a7@Spark |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Feb 3, 2026 at 15:13 +0800, Richard Guo <guofenglinux(at)gmail(dot)com>, wrote:
>
> This topic has been discussed several times in the past. Due to the
> semantic mismatch regarding NULL handling, NOT IN is not ordinarily
> equivalent to an anti-join. However, if we can prove that neither the
> outer expressions nor the subquery outputs can yield NULL values, it
> should be safe to convert NOT IN to an anti-join.
Thanks for working on this!
I've reviewed the patch and it looks good overall.
I noticed several minor issues in the test case comments:
1. The comment doesn't match the SQL:
+-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable
+-- by qual clause
+SELECT * FROM not_null_tab
+WHERE id NOT IN (
+ SELECT t1.id
+ FROM null_tab t1
+ INNER JOIN null_tab t2 ON t1.id = t2.id
+ LEFT JOIN null_tab t3 ON TRUE
+);
The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here.
2. Another test case that could use a more precise comment:
+-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join,
+-- inner side is defined NOT NULL
+SELECT * FROM not_null_tab t1
+LEFT JOIN not_null_tab t2
+ON t2.id NOT IN (SELECT id FROM not_null_tab);
Correct me if I’m wrong.
This is a subtle case - the key point is that the ON clause is evaluated on actual t2 rows *before* LEFT JOIN's null-padding.
The current comment is technically correct but might be clearer as:
-- ANTI JOIN: outer side(t2) is defined NOT NULL.
— ON clause is evaluated on actual t2 rows before LEFT JOIN's
-- null-padding, so t2.id is NOT NULL; inner side is also defined NOT NULL
3.Also, one suggestion for additional test coverage - the case where the subquery output comes from the nullable side of an outer join but is forced non-nullable by qual:
-- ANTI JOIN: inner side comes from nullable side of outer join
-- but is forced non-nullable by WHERE clause
EXPLAIN (COSTS OFF)
SELECT * FROM not_null_tab
WHERE id NOT IN (
SELECT t2.id
FROM not_null_tab t1
LEFT JOIN not_null_tab t2 ON t1.id = t2.id
WHERE t2.id IS NOT NULL
);
The existing tests cover t1.id (non-nullable side) with IS NOT NULL, but not t2.id (nullable side).
If I read the code correctly, this should work via find_subquery_safe_quals + find_nonnullable_vars, but explicit coverage would be good.
And I test it:
QUERY PLAN
----------------------------------------------------
Hash Anti Join
Hash Cond: (not_null_tab.id = t2.id)
-> Seq Scan on not_null_tab
-> Hash
-> Merge Join
Merge Cond: (t2.id = t1.id)
-> Sort
Sort Key: t2.id
-> Seq Scan on null_tab t2
Filter: (id IS NOT NULL)
-> Sort
Sort Key: t1.id
-> Seq Scan on null_tab t1
(13 rows)
--
Zhang Mingli
HashData
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2026-02-04 04:53:20 | Re: [19] CREATE SUBSCRIPTION ... SERVER |
| Previous Message | vignesh C | 2026-02-04 04:46:22 | Re: [Proposal] Adding Log File Capability to pg_createsubscriber |