Re: Making Vars outer-join aware

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hans Buschmann <buschmann(at)nidsa(dot)net>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
Subject: Re: Making Vars outer-join aware
Date: 2023-01-24 19:31:27
Message-ID: 2291988.1674588687@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Hans Buschmann <buschmann(at)nidsa(dot)net> writes:
>> I just noticed your new efforts in this area.
>> I wanted to recurr to my old thread [1] considering constant propagation of quals.
>> [1] https://www.postgresql.org/message-id/1571413123735.26467@nidsa.net

> Yeah, this patch series is not yet quite up to the point of improving
> that. That area is indeed the very next thing I want to work on, and
> I did spend some effort on it last month, but I ran out of time to get
> it working. Maybe we'll have something there for v17.

BTW, to clarify what's going on there: what I want to do is allow
the regular equivalence-class machinery to handle deductions from
equality operators appearing in LEFT JOIN ON clauses (maybe full
joins too, but I'd be satisfied if it works for one-sided outer
joins). I'd originally hoped that distinguishing pre-nulled from
post-nulled variables would be enough to make that safe, but it's
not. Here's an example:

select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);

If we turn the generic equivclass.c logic loose on these clauses,
it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
the scan of t2, which is even better (since we might be able to turn
that into an indexscan qual). However, it will also try to apply
t1.x = 1 at the scan of t1, and that's just wrong, because that
will eliminate t1 rows that should come through with null extension.

My current plan for making this work is to define
EquivalenceClass-generated clauses as applying within "join domains",
which are sets of inner-joined relations, and in the case of a one-sided
outer join then the join itself belongs to the same join domain as its
right-hand side --- but not to the join domain of its left-hand side.
This would allow us to push EC clauses from an outer join's qual down
into the RHS, but not into the LHS, and then anything leftover would
still have to be applied at the join. In this example we'd have to
apply t1.x = t2.y or t1.x = 1, but not both, at the join.

I got as far as inventing join domains, in the 0012 patch of this
series, but I haven't quite finished puzzling out the clause application
rules that would be needed for this scenario. Ordinarily an EC
containing a constant would be fully enforced at the scan level
(i.e., apply t1.x = 1 and t2.y = 1 at scan level) and generate no
additional clauses at join level; but that clearly doesn't work
anymore when some of the scans are outside the join domain.
I think that the no-constant case might need to be different too.
I have some WIP code but nothing I can show.

Also, this doesn't seem to help for full joins. We can treat the
two sides as each being their own join domains, but then the join's
own ON clause doesn't belong to either one, since we can't throw
away rows from either side on the basis of a restriction from ON.
So it seems like we'll still need ad-hoc logic comparable to
reconsider_full_join_clause, if we want to preserve that optimization.
I'm only mildly discontented with that, but still discontented.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2023-01-24 19:47:53 Re: Making Vars outer-join aware
Previous Message Robert Haas 2023-01-24 19:21:15 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation