Re: Left Outer Join much faster than non-outer Join?

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: rm_pg(at)cheapcomplexdevices(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Left Outer Join much faster than non-outer Join?
Date: 2005-03-31 07:04:53
Message-ID: 424BA115.8050606@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> rm_pg(at)cheapcomplexdevices(dot)com writes:
>
>> select *
>> from streetname_lookup as sl
>> join city_lookup as cl on (true)
>> left outer join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
>> where str_name='alamo' and city='san antonio' and state='TX'
>>;
>
> That's a fairly odd query;

I think it's a very common type of query in data warehousing.

It's reasonably typical of a traditional star schema where
"streetname_lookup" and "city_lookup" are dimension tables
and "tlid_smaller" is the central fact table.

> why don't you have any join condition between
> streetname_lookup and city_lookup?

Those two tables shared no data. They merely get the "id"s
for looking things up in the much larger central table.

Unique indexes on the city_lookup and street_lookup make the
cartesian join harmless (they each return only 1 value); and
the huge fact table has a multi-column index that takes both
of the ids from those lookups.

With the tables I have (shown below), how else could one
efficiently fetch the data for "Main St" "San Francisco"?

streetname_lookup
(for every street name used in the country)
streetid | name | type
----------+--------+------
1 | Main | St
2 | 1st | St

city_lookup
(for every city name used in the country)
cityid | name | state
--------+---------+------
1 | Boston | MA
2 | Alameda| CA

tlid_smaller
(containing a record for every city block in the country)
city_id | street_id | addresses | demographics, etc.
--------+------------+-----------+----------------------
1 | 1 | 100 block | [lots of columns]
1 | 1 | 200 block | [lots of columns]
1 | 1 | 300 block | [lots of columns]
1 | 2 | 100 block | [lots of columns]
1 | 2 | 100 block | [lots of columns]

> The planner won't consider Cartesian joins unless forced to, which is
> why it fails to consider the join order "((sl join cl) join ts)" unless
> you have an outer join in the mix. I think that's generally a good
> heuristic, and am disinclined to remove it ...

IMHO it's a shame it doesn't even consider it when the estimated
results are very small. I think often joins that merely look up
IDs would be useful to consider for the purpose of making potential
multi-column indexes (as shown in the previous email's explain
analyze result where the cartesian join was 30X faster than the
other approach since it could use the multi-column index on the
very large table).

Ron

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron Mayer 2005-03-31 08:15:55 Re: Left Outer Join much faster than non-outer Join?
Previous Message Tom Lane 2005-03-31 04:07:52 Re: Left Outer Join much faster than non-outer Join?