Re: Multiple outer join on same table

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Marco Lazzeri <marcomail(at)noze(dot)it>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Multiple outer join on same table
Date: 2004-05-14 05:40:26
Message-ID: 87lljvpnw5.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Marco Lazzeri <marcomail(at)noze(dot)it> writes:

> Hi!
> I'm searching a better (quicker) way to retrieve data as I used to do
> using the following query...
>
> ==
>
> SELECT main.codice,
> other.value AS value_one,
> other.value AS value_two
> FROM main
> LEFT OUTER JOIN other ON (main.id = other.id_main)
> LEFT OUTER JOIN other AS other2 ON (main.id = other2.id_main)
> WHERE other.type = 'type_one'
> AND other2.type = 'type_two'
> ;

a) you're better off sending the actual query rather than retyping it. I
assume you made a typo in the select column list and it should be
"other2.value AS value_two"? Also the parentheses are required on the ON
clause.

b) The WHERE clause will effectively make this a plain inner join, not an
outer join at all. Since any values that aren't found would have a NULL type
column and cause the row to not be selected.

I think the query you meant to write would be

SELECT codice,
other1.value AS value_one,
other2.value AS value_two
FROM main
LEFT OUTER JOIN other as other1 ON (main.id = other1.id_main AND type = 'type_one')
LEFT OUTER JOIN other as other2 ON (main.id = other2.id_main AND type = 'type_two)

Another way to write this query that might be faster or might not depending
would be:

SELECT codice,
(SELECT value FROM other WHERE id_main = id AND type = 'type_one') AS value_one,
(SELECT value FROM other WHERE id_main = id AND type = 'type_two') AS value_two
FROM codice

In theory the two queries really ought to always result in the same plan
because they're equivalent. However the Postgres optimizer as clever as it is
is incapable of seeing this.

The first form with the outer join leaves the optimizer with a lot more
flexibility though, including at least one plan that is effectively identical
to what the optimizer is forced to do for the second query. So really the
first one should be no worse than the second.

If you find the second faster (or if they're both still slow) you might
consider posting explain analyze output for both queries. It may be that you
have other issues preventing the optimizer from finding a good plan.

You have run analyze on these tables recently? And you vacuum regularly? And
for the second query you would really want an index on other.id_main too. For
the first one it would depend on the data in the two tables.

--
greg

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Stephan Szabo 2004-05-14 15:06:31 Re: where is this problem (trigger)
Previous Message Greg Stark 2004-05-14 05:26:09 Re: isolation level