Re: Convert NOT IN sublinks to anti-joins when safe

From: Zhang Mingli <zmlpostgres(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: Convert NOT IN sublinks to anti-joins when safe
Date: 2026-02-04 04:49:43
Message-ID: 3b45ceea-016e-475a-a2c0-321d52f4d7a7@Spark
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Feb 3, 2026 at 15:13 +0800, Richard Guo <guofenglinux(at)gmail(dot)com>, wrote:
>
> This topic has been discussed several times in the past. Due to the
> semantic mismatch regarding NULL handling, NOT IN is not ordinarily
> equivalent to an anti-join. However, if we can prove that neither the
> outer expressions nor the subquery outputs can yield NULL values, it
> should be safe to convert NOT IN to an anti-join.

Thanks for working on this!
I've reviewed the patch and it looks good overall.

I noticed several minor issues in the test case comments:

1. The comment doesn't match the SQL:

+-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable
+-- by qual clause
+SELECT * FROM not_null_tab
+WHERE id NOT IN (
+    SELECT t1.id
+    FROM null_tab t1
+    INNER JOIN null_tab t2 ON t1.id = t2.id
+    LEFT JOIN null_tab t3 ON TRUE
+);

The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here.

2. Another test case that could use a more precise comment:

+-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join,
+-- inner side is defined NOT NULL
+SELECT * FROM not_null_tab t1
+LEFT JOIN not_null_tab t2
+ON t2.id NOT IN (SELECT id FROM not_null_tab);

Correct me if I’m wrong.
This is a subtle case - the key point is that the ON clause is evaluated on actual t2 rows *before* LEFT JOIN's null-padding.
The current comment is technically correct but might be clearer as:

-- ANTI JOIN: outer side(t2) is defined NOT NULL.
— ON clause is evaluated on actual t2 rows before LEFT JOIN's
-- null-padding, so t2.id is NOT NULL; inner side is also defined NOT NULL

3.Also, one suggestion for additional test coverage - the case where the subquery output comes from the nullable side of an outer join but is forced non-nullable by qual:

-- ANTI JOIN: inner side comes from nullable side of outer join
-- but is forced non-nullable by WHERE clause
EXPLAIN (COSTS OFF)
SELECT * FROM not_null_tab
WHERE id NOT IN (
    SELECT t2.id
    FROM not_null_tab t1
    LEFT JOIN not_null_tab t2 ON t1.id = t2.id
    WHERE t2.id IS NOT NULL
);

The existing tests cover t1.id (non-nullable side) with IS NOT NULL, but not t2.id (nullable side).

If I read the code correctly, this should work via find_subquery_safe_quals + find_nonnullable_vars, but explicit coverage would be good.

And I test it:
                     QUERY PLAN
----------------------------------------------------
 Hash Anti Join
   Hash Cond: (not_null_tab.id = t2.id)
   ->  Seq Scan on not_null_tab
   ->  Hash
         ->  Merge Join
               Merge Cond: (t2.id = t1.id)
               ->  Sort
                     Sort Key: t2.id
                     ->  Seq Scan on null_tab t2
                           Filter: (id IS NOT NULL)
               ->  Sort
                     Sort Key: t1.id
                     ->  Seq Scan on null_tab t1
(13 rows)

--
Zhang Mingli
HashData

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2026-02-04 04:53:20 Re: [19] CREATE SUBSCRIPTION ... SERVER
Previous Message vignesh C 2026-02-04 04:46:22 Re: [Proposal] Adding Log File Capability to pg_createsubscriber