Re: Optimization rules for semi and anti joins

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgreSQL(dot)org>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Optimization rules for semi and anti joins
Date: 2009-02-11 23:57:48
Message-ID: 4993119C.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote (in response to Kevin Grittner's recent issues):
>> Reflecting on this further, I suspect there are also some bugs in
>> the planner's rules about when semi/antijoins can commute with
>> other joins;
>
> After doing some math I've concluded this is in fact the case.
> Anyone want to check my work?

> Hence semijoins can be rearranged just as freely as inner joins.

Made sense. Agreed.

> A6. (A antijoin B on (Pab)) leftjoin C on (Pbc)
> = A antijoin (B leftjoin C on (Pbc)) on (Pab)
>
> The second form is in fact equivalent to null-extending the A/B
> antijoin --- the actual contents of C cannot affect the result. So
> we could just drop C altogether. (I'm not going to do anything
> about that now, but it's something to consider for the planned
> join-elimination optimization.) In the first form, if Pbc is strict
> on B then it must fail for all rows of the antijoin result so we get
> the null-extended A/B result. If Pbc is not strict then the first
> form might duplicate some rows in the antijoin result, or produce
> non-null-extended rows. So in this case the identity holds only if
> Pbc is strict, which is the same as for left joins.

How do you get the first form as a starting point? Aren't we limited
to NOT IN or NOT EXISTS clauses for B? Those can't really contribute
columns to the result can they? (I'm probably missing something
here.)

The rest of it made sense, although two identities jumped to mind
which weren't listed.

(A semijoin B on (Pab)) antijoin C on (Pac)
= (A antijoin C on (Pac)) semijoin B on (Pab)

This one turned out, on closer inspection, to be a restatement of S4.

(A semijoin B on (Pab)) antijoin C on (Pbc)
= A semijoin (B antijoin C on (Pbc)) on (Pab)

I think this one is true, and it doesn't seem to be mentioned, unless
I'm missing something. It seems potentially useful.

Hopefully I didn't miss your point entirely.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-02-12 00:07:27 Re: Strange issue with CREATE OPERATOR CLASS
Previous Message Tom Lane 2009-02-11 23:56:54 Re: A deprecation policy