| From: | wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
|---|---|
| To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
| Cc: | Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Reduce LEFT/FULL JOIN to ANTI JOIN in more cases |
| Date: | 2026-06-05 09:32:47 |
| Message-ID: | CAGjGUALUXJbLjv8fAs6w+JgxgmKCXE_rnOwZY=4pQj7Mrsy_Qg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Richard
Thanks for working on this. I think the overall idea makes sense: if
a
a strict qual inside the RHS subtree proves that a RHS Var is non-null for
every row emitted by that subtree, then using that proof to recognize the
LEFT JOIN ... IS NULL case as an anti join seems valid .
The propagation rules also look conservative: inner/semi joins can carry
proofs from both sides plus their own quals, while outer joins only carry
proofs from the non-nullable side.
Maybe I'm overthinking this, but one implementation detail worth considering
is whether pass1 should store the derived proof directly, rather than
carrying
the original safe qual lists around. `safe_quals` seems to be used later
only
to call `find_nonnullable_vars()`, so the state could perhaps keep something
like:
```
List *safe_nonnullable_vars;
then, when a safe quail is found:
result->safe_nonnullable_vars =
mbms_add_members(result->safe_nonnullable_vars,
find_nonnullable_vars(f->quals));
And in the anti-join check:
nonnullable_vars =
mbms_add_members(NIL, right_state->safe_nonnullable_vars);
nonnullable_vars =
mbms_add_members(nonnullable_vars,
find_nonnullable_vars(j->quals));
That would keep the pass1 state closer to what pass2 actually needs, and
avoid
carrying/concatenating implicit-AND qual lists only to rescan them later.
One
small detail is that mbms_add_members() mutates its first argument, so when
propagating a child state through an outer join we'd want to copy it, e.g.
with
mbms_add_members(NIL, child_state->safe_nonnullable_vars), rather than
assign
the child list directly.
This is not a correctness objection to the current patch, just a possible
way
to make the stored state match its only consumer more closely.
Thanks
On Fri, Jun 5, 2026 at 3:50 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> On Fri, Jun 5, 2026 at 12:00 PM Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
> wrote:
> > On Thu, 4 Jun 2026 at 22:55, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> >> reduce_outer_joins() already recognizes that a LEFT JOIN is really an
> >> anti-join when an upper qual forces a nullable-side Var to be NULL
> >> while that Var is actually non-null in every matching row. In that
> >> case only the null-extended (unmatched) rows can satisfy the upper
> >> qual, which is exactly anti-join semantics, so we switch JOIN_LEFT to
> >> JOIN_ANTI. This is worth detecting because an anti-join is usually
> >> much cheaper than computing the outer join and then filtering the
> >> result with the IS NULL clause.
>
> > While you’re at it, any chance of changing it so that "a LEFT JOIN b
> WHERE b IS NULL" is guaranteed not to have worse performance than " a LEFT
> JOIN b WHERE b.f IS NULL"? I've had this and asked about it here and it was
> suggested that I should specify the primary key fields of b; but surely if
> I want an antijoin it's clearer and better implicit documentation if I
> don't zero in on specific fields. I think somebody else is suggesting "a
> LEFT ANTI JOIN b" which would be even better.
>
> The reason the planner doesn't recognize "WHERE b IS NULL" during
> outer-join reduction is that find_forced_null_var() skips NullTest
> nodes with argisrow=true, which is what the parser produces for a
> whole-row IS NULL test. So the anti-join reduction never sees it.
>
> To handle it we'd need to decompose the whole-row test into per-column
> checks, and that expansion needs the relation's column list, which
> find_forced_null_var() doesn't have. One option would be to let
> find_forced_null_var() recognize the whole-row IS NULL and hand back
> the whole-row Var, then have reduce_outer_joins() expand it into the
> relation's columns before running the non-null proof.
>
> That said, I'm not sure it's worth the churn. IMHO, the whole-row IS
> NULL test seems less common than the column form. If you want an
> anti-join, it seems more natural to just write NOT EXISTS. For the
> same reason, I'm not convinced a "LEFT ANTI JOIN" would buy us much
> over NOT EXISTS.
>
> - Richard
>
>
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jelte Fennema-Nio | 2026-06-05 09:34:14 | Re: alert clients when prepared statements are deallocated |
| Previous Message | Chao Li | 2026-06-05 09:31:31 | Prevent remote libpq notices from being sent to clients |