Re: Convert NOT IN sublinks to anti-joins when safe

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Zhang Mingli <zmlpostgres(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Convert NOT IN sublinks to anti-joins when safe
Date: 2026-02-09 07:43:40
Message-ID: CAMbWs4-Fv7CqZ+M+S=P8peVbBpRfyov=k6zzVL2pKLBzSMHVFg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 4, 2026 at 1:49 PM Zhang Mingli <zmlpostgres(at)gmail(dot)com> wrote:
> 1. The comment doesn't match the SQL:
>
> +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable
> +-- by qual clause
> +SELECT * FROM not_null_tab
> +WHERE id NOT IN (
> + SELECT t1.id
> + FROM null_tab t1
> + INNER JOIN null_tab t2 ON t1.id = t2.id
> + LEFT JOIN null_tab t3 ON TRUE
> +);
>
> The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here.

The qual "t1.id = t2.id" here is sufficient to force t1.id (and t2.id)
nonnullable, as the operator is strict. It doesn't have to be an
explicit IS NOT NULL qual.

> 2. Another test case that could use a more precise comment:
>
> +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join,
> +-- inner side is defined NOT NULL
> +SELECT * FROM not_null_tab t1
> +LEFT JOIN not_null_tab t2
> +ON t2.id NOT IN (SELECT id FROM not_null_tab);
>
> Correct me if I’m wrong.
> This is a subtle case - the key point is that the ON clause is evaluated on actual t2 rows *before* LEFT JOIN's null-padding.
> The current comment is technically correct but might be clearer as:
>
> -- ANTI JOIN: outer side(t2) is defined NOT NULL.
> — ON clause is evaluated on actual t2 rows before LEFT JOIN's
> -- null-padding, so t2.id is NOT NULL; inner side is also defined NOT NULL

Hmm, I'm not sure if this is necessary. I don't think this test case
needs to concern itself with explaining standard JOIN/ON semantics.

> 3.Also, one suggestion for additional test coverage - the case where the subquery output comes from the nullable side of an outer join but is forced non-nullable by qual:
>
> -- ANTI JOIN: inner side comes from nullable side of outer join
> -- but is forced non-nullable by WHERE clause
> EXPLAIN (COSTS OFF)
> SELECT * FROM not_null_tab
> WHERE id NOT IN (
> SELECT t2.id
> FROM not_null_tab t1
> LEFT JOIN not_null_tab t2 ON t1.id = t2.id
> WHERE t2.id IS NOT NULL
> );
>
> The existing tests cover t1.id (non-nullable side) with IS NOT NULL, but not t2.id (nullable side).

Right, we can include this one in the test case.

- Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2026-02-09 07:52:10 Re: Changing the state of data checksums in a running cluster
Previous Message jian he 2026-02-09 07:43:38 Re: CREATE TABLE LIKE INCLUDING TRIGGERS