Re: JOIN vs. LEFT JOIN

From: Andreas Wenk <a(dot)wenk(at)netzmeister-st-pauli(dot)de>
To: Nico Callewaert <callewaert(dot)nico(at)telenet(dot)be>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: JOIN vs. LEFT JOIN
Date: 2009-01-28 11:46:57
Message-ID: 498045B1.60907@netzmeister-st-pauli.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nico Callewaert schrieb:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Nico Callewaert schrieb:
>>> Hi !
>>>
>>> I heard that it is always better to use a full JOIN rather than a LEFT
>>> JOIN, for performance reasons. Is that correct ?
>>
>> please note that a JOIN and a LEFT JOIN are tow different things.
>> Let's assume you have
>> two tables (without any sens):
>>
>> books:
>>
>> headline | content
>> - ----------+-------------
>> politics | russia
>> politics | middle east
>> politics | germany
>>
>> interests:
>>
>> content
>> - ---------
>> germany
>>
>> Then fire two select statements like this:
>>
>> #1:
>> SELECT a.headline,a.content,b.content as contentb
>> FROM books a
>> JOIN interests b ON a.content = b.content;
>>
>> headline | content | contentb
>> - ----------+---------+----------
>> politics | germany | germany
>> (1 row)
>>
>> #2:
>> SELECT a.headline,a.content,b.content as contentb
>> FROM books a
>> LEFT JOIN interests b on a.content = b.content;
>> headline | content | contentb
>> - ----------+-------------+----------
>> politics | russia |
>> politics | middle east |
>> politics | germany | germany
>> (3 rows)
>>
>>> But it's barely possible to use full JOINS all the time, since most of
>>> the lookup fields are not required.
>>> Example : in the customer table, the customer type is a looup field to a
>>> customer_type table. But that is not required to post the record. So I
>>> was thinking to create a record in the customer_type table with ID =
>>> -1. And in case the customer type is NULL, to assign it the value -1.
>>> That way, it will be always possible to do a full JOIN. I was wondering
>>> if that is good practice or rather nonsense ?
>>
>> Hm concerning the functionality of LEFT JOIN I do not really
>> understand what you wanna do
>> here. You created the customer_type table to have the possibility to
>> give more types to
>> one customer (1:n). But therefore you do not need a LEFT JOIN. The
>> statement could be like:
>>
>> SELECT name a
>> FROM customer a, cutomer_type b
>> WHERE a.id = b.id
>> AND b.type = 'super customer'
>>
>> Or not?
>>
>> Cheers
>>
>> Andy
>
>
> Hi Andreas,
>
> Thanks for the reply. I know the difference between JOIN and LEFT JOIN.
> The question was just if there is a performance hit between the 2 ?
>
> I never use constructs like this : SELECT name a
>> FROM customer a, cutomer_type b
>> WHERE a.id = b.id
>> AND b.type = 'super customer'
>
> Always Joins.
> I will try to clarify. The current situation is : the customer_type has
> no value, so = NULL, no problem with LEFT JOIN.

Why do you need an empty entry in this table? Your application should take care, that this
does not happen ...

> But I'm wondering with tables that has thousands of records, if LEFT
> JOINS are performing well ?

EXPLAIN ANALYZE with my examples:

#1
test=# explain analyze select a.headline,a.content,b.content as contentb from books a join
interests b on a.content = b.content;
QUERY PLAN

-
------------------------------------------------------------------------------------------------------------------
Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual time=0.051..0.058 rows=1 loops=1)
Hash Cond: (("outer".content)::text = ("inner".content)::text)
-> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) (actual
time=0.007..0.010 rows=1 loops=1)
-> Hash (cost=10.70..10.70 rows=70 width=1032) (actual time=0.026..0.026 rows=3 loops=1)
-> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual
time=0.004..0.013 rows=3 loops=1)
Total runtime: 0.094 ms
(6 rows)

#2
test=# explain analyze select a.headline,a.content,b.content as contentb from books a
left join interests b on a.content = b.content;
QUERY PLAN

-
----------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual time=0.038..0.062 rows=3
loops=1)
Hash Cond: (("outer".content)::text = ("inner".content)::text)
-> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual
time=0.007..0.014 rows=3 loops=1)
-> Hash (cost=11.40..11.40 rows=140 width=516) (actual time=0.015..0.015 rows=1 loops=1)
-> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) (actual
time=0.004..0.007 rows=1 loops=1)
Total runtime: 0.102 ms
(6 rows)

I fired each statement ten times. But seriously - this is more or less just a hint that a
LEFT JOIN is less performant tahn a JOIN. To get that exactly it depend on which execution
plan the planner is using. So the best way ist to use EXPALIN ANALYZE with your statements.

Sorry for the big post with much output ...

Cheers

Andy

> Thanks again, Nico
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJgEWwVa7znmSP9AwRAi6NAKDdU4+b+CRzEUg8WsfN2M2P9CVl0wCg3iJn
14HzK+4/IHfdcVfqY4Jbf1g=
=pNKm
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Nico Callewaert 2009-01-28 11:59:35 Re: JOIN vs. LEFT JOIN
Previous Message Nico Callewaert 2009-01-28 11:30:24 Re: JOIN vs. LEFT JOIN