Quick Links

Re: Left Outer Join much faster than non-outer Join?

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	rm_pg(at)cheapcomplexdevices(dot)com, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Left Outer Join much faster than non-outer Join?
Date:	2005-03-31 07:04:53
Message-ID:	424BA115.8050606@cheapcomplexdevices.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Tom Lane wrote:
> rm_pg(at)cheapcomplexdevices(dot)com writes:
>
>> select *
>> from streetname_lookup as sl
>> join city_lookup as cl on (true)
>> left outer join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
>> where str_name='alamo' and city='san antonio' and state='TX'
>>;
>
> That's a fairly odd query;

I think it's a very common type of query in data warehousing.

It's reasonably typical of a traditional star schema where
"streetname_lookup" and "city_lookup" are dimension tables
and "tlid_smaller" is the central fact table.

> why don't you have any join condition between
> streetname_lookup and city_lookup?

Those two tables shared no data. They merely get the "id"s
for looking things up in the much larger central table.

Unique indexes on the city_lookup and street_lookup make the
cartesian join harmless (they each return only 1 value); and
the huge fact table has a multi-column index that takes both
of the ids from those lookups.

With the tables I have (shown below), how else could one
efficiently fetch the data for "Main St" "San Francisco"?

streetname_lookup
(for every street name used in the country)
streetid | name | type
----------+--------+------
1 | Main | St
2 | 1st | St

> The planner won't consider Cartesian joins unless forced to, which is
> why it fails to consider the join order "((sl join cl) join ts)" unless
> you have an outer join in the mix. I think that's generally a good
> heuristic, and am disinclined to remove it ...

IMHO it's a shame it doesn't even consider it when the estimated
results are very small. I think often joins that merely look up
IDs would be useful to consider for the purpose of making potential
multi-column indexes (as shown in the previous email's explain
analyze result where the cartesian join was 30X faster than the
other approach since it could use the multi-column index on the
very large table).

Ron

In response to

Re: Left Outer Join much faster than non-outer Join? at 2005-03-31 04:07:52 from Tom Lane

Responses

Re: Left Outer Join much faster than non-outer Join? at 2005-03-31 08:15:55 from Ron Mayer
Re: Left Outer Join much faster than non-outer Join? at 2005-03-31 08:28:11 from Patrick Vedrines

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Ron Mayer	2005-03-31 08:15:55	Re: Left Outer Join much faster than non-outer Join?
Previous Message	Tom Lane	2005-03-31 04:07:52	Re: Left Outer Join much faster than non-outer Join?