Re: Patch to support SEMI and ANTI join removal

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-09-30 12:34:57
Message-ID: 20140930123457.GQ2084@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-10-01 01:03:35 +1300, David Rowley wrote:
> On Wed, Oct 1, 2014 at 12:01 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
>
> > On 2014-09-30 23:25:45 +1300, David Rowley wrote:
> > >
> > > I've not quite gotten my head around how we might stop the unneeded
> > > relation from being the primary path to join the other inner relations,
> > > i.e. what would stop the planner making a plan that hashed all the other
> > > relations and planned to perform a sequence scan on the relation that we
> > > have no need to scan (because the foreign key tells us the join is
> > > pointless). If we were not use use that relation then we'd just have a
> > > bunch of hash tables with no path to join them up. If we did anything to
> > > force the planner into creating a plan that would suit skipping
> > relations,
> > > then we could possibly be throwing away the optimal plan..... Right?
> >
> > I'm not 100% sure I understand your problem description, but let me
> > describe how I think this would work. During planning, you'd emit the
> > exactly same plan as you'd today, with two exceptions:
> > a) When costing a node where one side of a join is very likely to be
> > removable, you'd cost it nearly as if there wasn't a join.
> >
>
> Ok given the tables:
> create table t1 (x int primary key);
> create table t2 (y int primary key);
>
> suppose the planner came up with something like:
>
> test=# explain (costs off) select t2.* from t1 inner join t2 on t1.x=t2.y;
> QUERY PLAN
> ----------------------------
> Hash Join
> Hash Cond: (t1.x = t2.y)
> -> Seq Scan on t1
> -> Hash
> -> Seq Scan on t2
>
> If we had a foreign key...
>
> alter table t2 add constraint t2_y_fkey foreign key (y) references t1 (x);
>
> ...the join to t1 could possibly be "ignored" by the executor... but
> there's a problem as the plan states we're going to seqscan then hash that
> relation, then seqscan t1 with a hash lookup on each of t1's rows. In this
> case how would the executor skip the scan on t1? I can see how this might
> work if it was t2 that we were removing, as we'd just skip the hash lookup
> part in the hash join node.

Hm, right. But that doesn't seem like a fatal problem to me. The planner
knows about t1/t2 and Seq(t1), Seq(t2), not just Hash(Seq(t2)). So it
can tell the HashJoin node that when the 'shortcut' qualifier is true,
it should source everything from Seq(t2). Since the sequence scan
doesn't care about the node ontop that doesn't seem to be overly
dramatic?
Obviously reality makes this a bit more complicated...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2014-09-30 14:15:48 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message David Rowley 2014-09-30 12:03:35 Re: Patch to support SEMI and ANTI join removal