Re: Convert NOT IN sublinks to anti-joins when safe

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Convert NOT IN sublinks to anti-joins when safe
Date: 2026-03-02 12:50:37
Message-ID: 0dd7bed2-2a1d-4a12-bddf-e00013744eab@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05.02.2026 07:09, Richard Guo wrote:
> On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
>> If the sub-select can yield NULLs, the rewrite can be fixed by adding an
>> OR t2.c1 IS NULL clause, such as:
>>
>> SELECT t1.c1 FROM t1 WHERE
>> NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)
>
> I'm not sure if this rewrite results in a better plan. The OR clause
> would force a nested loop join, which could be much slower than a
> hashed-subplan plan.

That's why I had shared a variant that doesn't have the OR but a instead
uses a second NOT EXISTS:

SELECT t1.c1 FROM t1 WHERE
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)

>> If the outer expression can yield NULLs, the rewrite can be fixed by
>> adding a t1.c1 IS NOT NULL clause, such as:
>>
>> SELECT t1.c1 FROM T1 WHERE
>> t1.c1 IS NOT NULL AND
>> NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)
>
> This rewrite doesn't seem correct to me. If t2 is empty, you would
> incorrectly lose the NULL rows from t1 in the final result.

Yes, that rewrite was only for the case where the outer expression can
yield NULLs but the sub-query cannot.

The very last rewrite combines both cases. The rewritten query then
looks like:

SELECT t1.c1 FROM T1 WHERE
t1.c1 IS NOT NULL AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND
NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)

>> What's our today's take on doing more involved transformations inside
>> the planner to support such cases? It would greatly open up the scope of
>> the optimization.
>
> As mentioned in my initial email, the goal of this patch is not to
> handle every possible case, but rather only to handle the basic form
> where both sides of NOT IN are provably non-nullable. This keeps the
> code complexity to a minimum, and I believe this would cover the most
> common use cases in real world.

Seems reasonable to start with the non-NULL variant, though there are
certainly cases where there's no PK / unique index on the relevant columns.

--
David Geier

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2026-03-02 13:00:24 Re: pg_waldump: support decoding of WAL inside tarfile
Previous Message VASUKI M 2026-03-02 12:48:05 Re: Optional skipping of unchanged relations during ANALYZE?