Re: Fast path for empty relids in check_outerjoin_delay()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <riguo(at)pivotal(dot)io>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fast path for empty relids in check_outerjoin_delay()
Date: 2019-01-13 19:34:40
Message-ID: 24942.1547408080@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Richard Guo <riguo(at)pivotal(dot)io> writes:
> On Tue, Jan 8, 2019 at 11:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The bigger picture here, of course, is that check_outerjoin_delay's
>> API is not at all well matched to what check_equivalence_delay needs:
>> it has to make the extra bms_copy shown above, and it has no use
>> for either of the relid sets that check_outerjoin_delay wants to
>> return. I believe it's like that because check_outerjoin_delay is
>> much older than the EquivalenceClass logic, and I didn't want to
>> redesign it when I put in ECs, and I also didn't want to just
>> duplicate all that logic. But maybe it's time to put some more thought
>> into that, and try to find a refactoring of check_outerjoin_delay that
>> suits both callers better.

> I am thinking about whether we can quickly tell a qual is
> outerjoin_delayed or not by some bms_overlap operations.

> A qual is regarded as being outerjoin_delayed if:

> 1) It references any nullable rels of some OJ below this qual, and
> 2) We haven't included all the rels of that OJ in relids.

> For 1), we can collect all the nullable rels of OJs below this qual in
> deconstruct_recurse() (like nullable_baserels, initsplan.c:976) and
> bms_overlap this qual's relids with the collected nullable_relids.

Yeah, although I'm not sure that there's a good way to produce that
info in check_equivalence_delay?

> For 2), I haven't figured out a good way.

I'm not sure that there's any realistic way to do better given just
the existing SpecialJoinInfo data structures. In particular, the
fact that we consider the join_info_list to be unordered is a
problem for this code --- it means we have to make multiple passes
to ensure we've accounted for everything.

I deliberately made the SpecialJoinInfo data structure as lean as
possible when I first designed it, reasoning that that would minimize
hazards from incorrect data structure contents that might force
rewriting lots of code. But now that it's been stable a long while,
maybe it's time to try to be smarter. For example, maybe it'd be sane
to convert the list into a tree and include fields that represent the
effects of not only each outer join but all its children. Then you
could (in principle) do what check_outerjoin_delay needs to do just
by looking at the current tree element. I don't have any concrete
thoughts here, though, it's just handwaving. One issue that would
need careful consideration is how to not break outer-join reordering
rules; probably, you'd need to be sure that whatever derived data
you add is defined in a way whereby it doesn't change in the face
of legal join reorderings.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-13 20:06:00 Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Previous Message James Coleman 2019-01-13 19:12:38 Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's