Re: Convert NOT IN sublinks to anti-joins when safe

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Convert NOT IN sublinks to anti-joins when safe
Date: 2026-02-05 06:09:17
Message-ID: CAMbWs49nvNcBaUXTw5_euodb7ONADwDULJ4Cxw5qurDXdurc+Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
> If the sub-select can yield NULLs, the rewrite can be fixed by adding an
> OR t2.c1 IS NULL clause, such as:
>
> SELECT t1.c1 FROM t1 WHERE
> NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)

I'm not sure if this rewrite results in a better plan. The OR clause
would force a nested loop join, which could be much slower than a
hashed-subplan plan.

> If the outer expression can yield NULLs, the rewrite can be fixed by
> adding a t1.c1 IS NOT NULL clause, such as:
>
> SELECT t1.c1 FROM T1 WHERE
> t1.c1 IS NOT NULL AND
> NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)

This rewrite doesn't seem correct to me. If t2 is empty, you would
incorrectly lose the NULL rows from t1 in the final result.

> What's our today's take on doing more involved transformations inside
> the planner to support such cases? It would greatly open up the scope of
> the optimization.

As mentioned in my initial email, the goal of this patch is not to
handle every possible case, but rather only to handle the basic form
where both sides of NOT IN are provably non-nullable. This keeps the
code complexity to a minimum, and I believe this would cover the most
common use cases in real world.

- Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2026-02-05 06:15:25 Re: CREATE TABLE LIKE INCLUDING TRIGGERS
Previous Message Chao Li 2026-02-05 05:49:53 Re: pg_upgrade: fix memory leak in SLRU I/O code