Re: Performance improvement for joins where outer side is unique

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2016-03-12 14:44:20
Message-ID: CAKFQuwZKHSR8Hkc+13S-2y_Cg5O105ZvG89Q6ermc_=pzuQq0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, March 12, 2016, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On 12 March 2016 at 11:43, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us <javascript:;>>
> wrote:
> > I wrote:
> >> I wondered why, instead of inventing an extra semantics-modifying flag,
> >> we couldn't just change the jointype to *be* JOIN_SEMI when we've
> >> discovered that the inner side is unique.
> >
> > BTW, to clarify: I'm not imagining that we'd make this change in the
> > query jointree, as for example prepjointree.c might do. That would
> appear
> > to add join order constraints, which we don't want. But I'm thinking
> that
> > at the instant where we form a join Path, we could change the Path's
> > jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the
> > inner side unique, rather than annotating the Path with a separate flag.
> > Then that representation is what propagates forward.
>
> Thanks for looking at this.
>
> Yes that might work, since we'd just be changing the jointype in the
> JoinPath, if that path was discarded if favour of, say the commutative
> variant, which was not "unique inner", then it shouldn't matter, as
> the join type for that path would be the original one.
>
> The thing that might matter is that, this;
>
>
> explain (costs off) select * from t1 inner join t2 on t1.id=t2.id
> QUERY PLAN
> ------------------------------
> Hash Join
> Hash Cond: (t1.id = t2.id)
> -> Seq Scan on t1
> -> Hash
> -> Seq Scan on t2
>
> could become;
>
> QUERY PLAN
> ------------------------------
> Hash Semi Join
> Hash Cond: (t1.id = t2.id)
> -> Seq Scan on t1
> -> Hash
> -> Seq Scan on t2
>
> Wouldn't that cause quite a bit of confusion? People browsing EXPLAIN
> output might be a bit confused at the lack of EXISTS/IN clause in a
> query which is showing a Semi Join. Now, we could get around that by
> adding JOIN_SEMI_INNER I guess, and just displaying that as a normal
> inner join, yet it'll behave exactly like JOIN_SEMI!
>
>
Don't the semantics of a SEMI JOIN also state that the output columns only
come from the outer relation? i.e., the inner relation doesn't contribute
either rows or columns to the final result? Or is that simply
an implementation artifact of the fact that the only current way to perform
a semi-join explicitly is via exists/in?

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2016-03-12 15:34:07 Re: [patch] Proposal for \crosstabview in psql
Previous Message Julien Rouhaud 2016-03-12 14:20:20 Re: auto_explain sample rate