Re: A bug in make_outerjoininfo

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A bug in make_outerjoininfo
Date: 2023-02-07 20:15:53
Message-ID: 3261423.1675800953@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> In cases where we have any clauses between two outer joins, these
> clauses should be treated as degenerate clauses in the upper OJ, and
> they may prevent us from re-ordering the two outer joins. Previously we
> have the flag 'delay_upper_joins' to help avoid the re-ordering in such
> cases.

> In b448f1c8 remove_useless_result_rtes will remove useless FromExprs and
> merge its quals up to parent. This makes flag 'delay_upper_joins' not
> necessary any more if the clauses between the two outer joins come from
> FromExprs. However, if the clauses between the two outer joins come
> from JoinExpr of an inner join, it seems we have holes in preserving
> ordering.

Hmm ... we'll preserve the ordering all right, but if we set commute_below
or commute_above_x bits that don't match reality then we'll have trouble
later with mis-marked varnullingrels, the same as we saw in b2d0e13a0.
I don't think you need a JoinExpr, an intermediate multi-member FromExpr
should have the same effect.

This possibility was bugging me a little bit while working on b2d0e13a0,
but I didn't have a test case showing that it was an issue, and it
doesn't seem that easy to incorporate into make_outerjoininfo's
SpecialJoinInfo-based logic. I wonder if we could do something based on
insisting that the upper OJ's relevant "min_xxxside" relids exactly match
the lower OJ's min scope, thereby proving that there's no relevant join
of any kind between them.

The main question there is whether it'd break optimization of any cases
where we need to apply multiple OJ identities to get to the most favorable
plan. I think not, as long as we compare the "min" relid sets not the
syntactic relid sets, but I've not done a careful analysis.

If that doesn't work, another idea could be to reformulate
make_outerjoininfo's loop as a re-traversal of the jointree, allowing
it to see intermediate plain joins directly. However, that still leaves
me wondering what we *do* about the intermediate joins. I don't think
we want to fail immediately on seeing one, because we could possibly
apply OJ identity 1 to get the inner join out of the way. That is:

((A leftjoin B on (Pab)) innerjoin C on (Pac)) leftjoin D on (Pbd)

initially looks like identity 3 can't apply, but apply identity 1
first:

((A innerjoin C on (Pac)) leftjoin B on (Pab)) leftjoin D on (Pbd)

and now it works (insert usual caveat about strictness):

(A innerjoin C on (Pac)) leftjoin (B leftjoin D on (Pbd)) on (Pab)

and you can even go back the other way:

(A leftjoin (B leftjoin D on (Pbd)) on (Pab)) innerjoin C on (Pac)

So it's actually possible to push an innerjoin out of the identity-3
nest in either direction, and we don't want to lose that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-02-07 20:18:32 Re: pglz compression performance, take two
Previous Message Sergei Kornilov 2023-02-07 20:14:52 Re:pg_stat_statements and "IN" conditions