Re: Planner : anti-join on left joins

From: Tender Wang <tndrwang(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Nicolas Adenis-Lamarre <nicolas(dot)adenis(dot)lamarre(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Planner : anti-join on left joins
Date: 2026-01-06 01:47:43
Message-ID: CAHewXNkHfPiOE26r061aCK1AGm=s=1EtBL8vZnHtujeD6Xm1gg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Richard Guo <guofenglinux(at)gmail(dot)com> 于2026年1月5日周一 21:26写道:

> On Mon, Jan 5, 2026 at 7:59 PM Tender Wang <tndrwang(at)gmail(dot)com> wrote:
> > Yeah, I forgot to add your suggestion. Please check the v4 patch.
> > I combined the code and tests into one patch and also added it to [1]
> commitfest.
>
> I looked through the patch and I like the idea. This is a very nice
> optimization opportunity.
>
> Regarding the implementation, I'm afraid that this patch can be wrong
> in a couple of cases.
>
> * When checking NOT NULL constraints to see if a var is non-nullable,
> this patch overlooks cases where the var can be nullable due to
> lower-level outer joins. For example, given tables t1, t2, t3 with
> schema (a NOT NULL, b, c), this patch would produce an incorrect plan
> for the query below.
>
> explain (costs off)
> select * from t1 left join
> (t2 left join t3 on t2.c = t3.c) on t1.b = t2.b
> where t3.a is null;
> QUERY PLAN
> ----------------------------------------
> Hash Anti Join
> Hash Cond: (t1.b = t2.b)
> -> Seq Scan on t1
> -> Hash
> -> Hash Left Join
> Hash Cond: (t2.c = t3.c)
> -> Seq Scan on t2
> -> Hash
> -> Seq Scan on t3
> (9 rows)
>
> Although t3.a references a not-null column, it can become null due to
> the t2/t3 join. So the upper left join should not be reduced to an
> anti join.
>

Yes, you're right.
I forgot to handle this case. Good example.

>
> * This patch doesn't account for inheritance parent tables, where some
> child tables might have a NOT NULL constraint for a column while
> others do not.
>

Yeah, another case I forgot to handle.

>
> The second point is straightforward to fix, but the first is more
> tricky since we did not record varnullingrels in forced_null_vars.
> One simple fix would be to only perform this optimization when
> right_state->contains_outer is false (meaning no outer joins exist in
> the RHS). However, this would be too restrictive and miss many
> optimization opportunities.
>
> Instead, I'm considering recording the relids of base rels that are
> nullable within each subtree in reduce_outer_joins_pass1_state. This
> would allow us to skip Vars that come from those rels when checking
> NOT NULL constraints. Something like attached.
>

I looked through the v5 patch. I don't think there's any objection from me.
To nitpick,
...
SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.z IS NULL;
* If we can prove that b.z must be non-null for any matching row, either
* because the join clause is strict for b.z, or because b.z is defined NOT
* NULL by table constraints,
...
How can the strict join clause influence b.z?
...
SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
...
The original query in the comments, I can understand. If the join clause is
strict, the b.y of the matched row can't be null.

--
Thanks,
Tender Wang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-01-06 02:03:48 Re: doc: update the default of data checksums in the doc of pg_rewind
Previous Message David Rowley 2026-01-06 01:40:19 Re: Fix outdated comments in catcache.h