Re: OUTER JOIN with filter

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Nicolas Fertig <nfertig(at)swissnow(dot)ch>, pgsql-sql(at)postgresql(dot)org
Subject: Re: OUTER JOIN with filter
Date: 2003-03-03 19:17:19
Message-ID: 15484.1046719039@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> For that matter, in a few queries I've found that it pays to force the
> planner's hand by repeating some of the clauses in the WHERE clause in the
> JOIN as well, as:

> SELECT a.x, b.y, c.z
> FROM a JOIN b ON (a.id = b.a_id AND b.type = 'm')
> LEFT OUTER JOIN c ON (b.id = c.b_id)
> WHERE b.type = 'm'

> The logic is fine. The issue comes where the outer joined table is several
> times larger than the main queried table. The planner should recognize the
> possibility of filtering the records in the joined table before joining in
> order to minimize the join operation.

AFAIR, it's always done that; certainly the above query looks quite
redundant to me. Note though that pushing down the WHERE clause is only
legal because b is on the outer side of the join.

A comparable query in the 7.3 regression database is

regression=# explain select * from
regression-# tenk1 a join tenk1 b on (a.unique1 = b.unique1)
regression-# left join tenk1 c on (b.unique2 = c.unique2)
regression-# where b.thousand = 999;
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..572.49 rows=10 width=732)
-> Nested Loop (cost=0.00..542.55 rows=10 width=488)
-> Seq Scan on tenk1 b (cost=0.00..483.00 rows=10 width=244)
Filter: (thousand = 999)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..6.01 rows=1 width=244)
Index Cond: (a.unique1 = "outer".unique1)
-> Index Scan using tenk1_unique2 on tenk1 c (cost=0.00..3.01 rows=1 width=244)
Index Cond: ("outer".unique2 = c.unique2)
(8 rows)

You can see that the condition on b.thousand does get pushed down to the
bottom scan. (There's no index on b.thousand, so we don't get an
indexscan --- but we would if there were an index...)

regards, tom lane

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Greg Stark 2003-03-03 20:46:17 Re: HardCORE QUERY HELP!!!
Previous Message Chris 2003-03-03 18:54:34 LIMIT and SUBQUERIES