Re: Making Vars outer-join aware

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
Subject: Re: Making Vars outer-join aware
Date: 2022-08-20 22:52:11
Message-ID: 3839865.1661035931@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Progress report on this ...

I've been trying to remove some of the cruftier aspects of
EquivalenceClasses (which really is the main point of this whole
effort), and gotten repeatedly blocked by the fact that the semantics
are still a bit crufty thanks to the hacking associated with outer
join identity 3. I think I see a path forward though.

To recap, the thing about identity 3 is that it says you can get
equivalent results from

(A leftjoin B on (Pab)) leftjoin C on (Pb*c)

A leftjoin (B leftjoin C on (Pbc)) on (Pab)

if Pbc is strict for B. Unlike what it says in optimizer/README,
I've written the first form as "Pb*c" to indicate that any B Vars
appearing in that clause will be marked as possibly nulled by
the A/B join. This makes the problem apparent: we cannot use
the same representation of Pbc for both join orders, because
in the second variant B's Vars are not nulled by anything.
We've been trying to get away with writing Pbc just one way,
and that leads directly to the semantic squishiness I've been
fighting.

What I'm thinking we should do about this, once we detect that
this identity is applicable, is to generate *both* forms of Pbc,
either adding or removing the varnullingrels bits depending on
which form we got from the parser. Then, when we come to forming
join paths, use the appropriate variant depending on which join
order we're considering. build_join_rel() already has the concept
that the join restriction list varies depending on which input
relations we're trying to join, so this doesn't require any
fundamental restructuring -- only a way to identify which
RestrictInfos to use or ignore for a particular join. That will
probably require some new field in RestrictInfo, but I'm not
fussed about that because there are other fields we'll be able
to remove as this work progresses.

Similarly, generate_join_implied_equalities() will need to generate
EquivalenceClass-derived join clauses with or without varnullingrels
marks, as appropriate. I'm not quite sure how to do that, but it
feels like just a small matter of programming, not a fundamental
problem with the model which is where things are right now.
We'll only need this for ECs that include source clauses coming
from a movable outer join clause (i.e., Pbc in identity 3).

An interesting point is that I think we want to force movable
outer joins into the second format for the purpose of generating
ECs, that is we want to use Pbc not Pb*c as the EC source form.
The reason for that is to allow generation of relation-scan-level
clauses from an EC, particularly an EC that includes a constant.
As an example, given

A leftjoin (B leftjoin C on (B.b = C.c)) on (A.a = B.b)
where A.a = constant

we can decide unconditionally that A.a, B.b, C.c, and the constant all
belong to the same equivalence class, and thereby generate relation
scan restrictions A.a = constant, B.b = constant, and C.c = constant.
If we start with the other join order, which will include "B.b* = C.c"
(ie Pb*c) then we'd have two separate ECs: {A.a, B.b, constant} and
{B.b*, C.c}. So we'll fail to produce any scan restriction for C, or
at least we can't do so in any principled way.

Furthermore, if the joins are done in the second order then we don't
need any additional join clauses -- both joins can act like "LEFT JOIN
ON TRUE". (Right now, we'll emit redundant B.b = C.c and A.a = B.b
join clauses in addition to the scan-level clauses, which is
inefficient.) However, if we make use of identity 3 to do the
joins in the other order, then we do need an extra join clause, like

(A leftjoin B on (true)) leftjoin C on (B.b* = C.c)

(or maybe we could just emit "B.b* IS NOT NULL" for Pb*c?)
Without any Pb*c join constraint we get wrong answers because
nulling of B fails to propagate to C.

So while there are lots of details to work out, it feels like
this line of thought can lead to something where setrefs.c
doesn't have to ignore varnullingrels mismatches (yay) and
there is no squishiness in EquivalenceClass semantics.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-08-20 23:28:59 Instrumented pages/tuples frozen in autovacuum's server log out (and VACUUM VERBOSE)
Previous Message Andres Freund 2022-08-20 21:44:47 Re: Strip -mmacosx-version-min options from plperl build