Re: Convert NOT IN sublinks to anti-joins when safe

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: David Geier <geidav(dot)pg(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Convert NOT IN sublinks to anti-joins when safe
Date: 2026-02-05 07:29:54
Message-ID: CAGjGUAJNHuj3X8VB1yaRV5qaCWU1WgzMVt2yXeoEAXi-LdPZcA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

HI Richard
> As mentioned in my initial email, the goal of this patch is not to
> handle every possible case, but rather only to handle the basic form
> where both sides of NOT IN are provably non-nullable. This keeps the
> code complexity to a minimum, and I believe this would cover the most
> common use cases in real world.
Agree +1 ,The current path already covers common scenarios and is no less
comprehensive than other databases.I'm already quite pleased that it can be
merged.
Having tested a certain widely used open-source database, I found it unable
to process the following query: `SELECT * FROM join1 WHERE id NOT IN
(SELECT id FROM join2 WHERE id IS NOT NULL);` Note that join2 allows null
values for id.

Thanks

On Thu, Feb 5, 2026 at 2:09 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
> > If the sub-select can yield NULLs, the rewrite can be fixed by adding an
> > OR t2.c1 IS NULL clause, such as:
> >
> > SELECT t1.c1 FROM t1 WHERE
> > NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)
>
> I'm not sure if this rewrite results in a better plan. The OR clause
> would force a nested loop join, which could be much slower than a
> hashed-subplan plan.
>
> > If the outer expression can yield NULLs, the rewrite can be fixed by
> > adding a t1.c1 IS NOT NULL clause, such as:
> >
> > SELECT t1.c1 FROM T1 WHERE
> > t1.c1 IS NOT NULL AND
> > NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)
>
> This rewrite doesn't seem correct to me. If t2 is empty, you would
> incorrectly lose the NULL rows from t1 in the final result.
>
> > What's our today's take on doing more involved transformations inside
> > the planner to support such cases? It would greatly open up the scope of
> > the optimization.
>
> As mentioned in my initial email, the goal of this patch is not to
> handle every possible case, but rather only to handle the basic form
> where both sides of NOT IN are provably non-nullable. This keeps the
> code complexity to a minimum, and I believe this would cover the most
> common use cases in real world.
>
> - Richard
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message lakshmi 2026-02-05 07:17:02 Re: parallel data loading for pgbench -i