Re: BUG #19071: commit b448f1c8d broke LEFT JOIN pushdown

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: a(dot)ratundalov(at)postgrespro(dot)ru
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: BUG #19071: commit b448f1c8d broke LEFT JOIN pushdown
Date: 2025-10-03 18:48:59
Message-ID: 1355268.1759517339@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> QUERY :
> EXPLAIN( verbose, costs off )
> SELECT * FROM a LEFT JOIN b ON a.c1=b.c1 WHERE a.c1=6;
> PLAN :
> QUERY PLAN
> ------------------------------------------------------------------------
> Nested Loop Left Join
> Output: a.c1, a.c2, b.c1, b.c2
> -> Foreign Scan on public.a_p2 a
> Output: a.c1, a.c2
> Remote SQL: SELECT c1, c2 FROM public.a WHERE ((c1 = 6))
> -> Materialize
> Output: b.c1, b.c2
> -> Foreign Scan on public.b_p2 b
> Output: b.c1, b.c2
> Remote SQL: SELECT c1, c2 FROM public.b WHERE ((c1 = 6))
> (10 rows)
>
> PLAN BEFORE THE CHANGES :
> QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Foreign Scan
> Output: a.c1, a.c2, b.c1, b.c2
> Relations: (public.a_p2 a) LEFT JOIN (public.b_p2 b)
> Remote SQL: SELECT r4.c1, r4.c2, r5.c1, r5.c2 FROM (public.a r4 LEFT JOIN
> public.b r5 ON (((r4.c1 = r5.c1)) AND ((r5.c1 = 6)))) WHERE ((r4.c1 = 6))
> (4 rows)

Thanks for the report. This is not actually specific to foreign-join
pushdown: the problem is that the partitionwise-join machinery does
not realize that this could be done as a partitionwise join. That's
because it needs to see the a.c1 = b.c1 condition, and it doesn't.
You're correct that this changed at b448f1c8d, and the reason is
explained in that commit message:

Use constant TRUE for "dummy" clauses when throwing back outer joins.
This improves on a hack I introduced in commit 6a6522529. If we
have a left-join clause l.x = r.y, and a WHERE clause l.x = constant,
we generate r.y = constant and then don't really have a need for the
join clause. But we must throw the join clause back anyway after
marking it redundant, so that the join search heuristics won't think
this is a clauseless join and avoid it. That was a kluge introduced
under time pressure, and after looking at it I thought of a better
way: let's just introduce constant-TRUE "join clauses" instead,
and get rid of them at the end. This improves the generated plans for
such cases by not having to test a redundant join clause. We can also
get rid of the ugly hack used to mark such clauses as redundant for
selectivity estimation.

Since we no longer "throw back" the redundant join clause, it's not
there for have_partkey_equi_join to find.

I don't feel too bad about this, because it's very specific to the
query pattern "a LEFT JOIN b ON a.x = b.y WHERE a.x = constant".
Closely-related cases like "a JOIN b ON a.x = b.y WHERE a.x = constant"
were not recognized as partitionwise-joinable either, before or after
that change. Richard did something about the inner-join case just
recently, in 9b282a935; but it seems we have more to do.

The reason the inner-join case is/was problematic is that we form
the equivalence class {a.x, b.y, constant} and then decide we can
enforce it with the two scan-level restrictions a.x = constant and
b.y = constant. So the a.x = b.y join condition is discarded,
just as in the left-join case. 9b282a935 fixed that by making
have_partkey_equi_join check to see if the partition keys are known
equal within any equivalence class. That doesn't help for the
left-join case, because the two Vars *aren't* "known equal" in this
case.

Not sure about a nice fix for this. I certainly don't want to revert
to the old kluge where we were carrying around redundant join clauses
as part of the main joinclause lists. I guess one idea could be to
not throw those clauses away completely, but save them aside somewhere
that would only be consulted by have_partkey_equi_join. That feels
ugly too, however. (In particular, I fear we might have to duplicate
a lot of the deconstruct_jointree logic to figure out which join level
to consider a particular saved-aside clause at.)

Right now, what we're doing in the left-join case is to form two
separate equivalence classes {a.x, constant} and {b.y, constant}
which do not get merged because while the Consts are equal(), they
are marked with different JoinDomains. I wonder whether we could
improve that now that Vars are marked with varnullingrels. That is,
perhaps we could go ahead and form {a.x, b.y, constant}, recognizing
that this implies a.x and b.y will be constrained to be equal
*at the scan level* but nothing is being promised about their
values post-outer-join. Above the outer join, it's impossible
to reference b.y anyway, only b.y* (where the star denotes the
varnullingrel bit for the left join), so that's not a member of
the EC and we won't make any false deductions about its value.

We'd have to rethink the business of marking Consts with different
JoinDomains. I've mostly swapped out the reasoning behind that,
so maybe there was a critical reason this approach would not work.
I'm also a little worried about the whole business of re-ordering
outer joins per identity 3 and the resulting squishiness in whether
a Var has yet been nulled by an outer join. Still, I'd rather
investigate this idea than throw in a quick-hack solution.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2025-10-03 21:21:16 Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c
Previous Message Tom Lane 2025-10-03 14:31:12 Re: [BUGS] BUG #11500: PRIMARY KEY index not being used