Re: JOIN vs. LEFT JOIN

From: Andreas Wenk <a(dot)wenk(at)netzmeister-st-pauli(dot)de>
To: Nico Callewaert <callewaert(dot)nico(at)telenet(dot)be>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: JOIN vs. LEFT JOIN
Date: 2009-01-28 12:12:54
Message-ID: 49804BC6.7010401@netzmeister-st-pauli.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nico Callewaert schrieb:
>>>> Nico Callewaert schrieb:
>>>>> Hi !
>>>>>
>>>>> I heard that it is always better to use a full JOIN rather than a LEFT
>>>>> JOIN, for performance reasons. Is that correct ?
>>>>
>>>> please note that a JOIN and a LEFT JOIN are tow different things.
>>>> Let's assume you have
>>>> two tables (without any sens):
>>>>
>>>> books:
>>>>
>>>> headline | content
>>>> - ----------+-------------
>>>> politics | russia
>>>> politics | middle east
>>>> politics | germany
>>>>
>>>> interests:
>>>>
>>>> content
>>>> - ---------
>>>> germany
>>>>
>>>> Then fire two select statements like this:
>>>>
>>>> #1:
>>>> SELECT a.headline,a.content,b.content as contentb
>>>> FROM books a
>>>> JOIN interests b ON a.content = b.content;
>>>>
>>>> headline | content | contentb
>>>> - ----------+---------+----------
>>>> politics | germany | germany
>>>> (1 row)
>>>>
>>>> #2:
>>>> SELECT a.headline,a.content,b.content as contentb
>>>> FROM books a
>>>> LEFT JOIN interests b on a.content = b.content;
>>>> headline | content | contentb
>>>> - ----------+-------------+----------
>>>> politics | russia |
>>>> politics | middle east |
>>>> politics | germany | germany
>>>> (3 rows)
>>>>
>>>>> But it's barely possible to use full JOINS all the time, since most of
>>>>> the lookup fields are not required.
>>>>> Example : in the customer table, the customer type is a looup field
>>>>> to a
>>>>> customer_type table. But that is not required to post the record.
>>>>> So I
>>>>> was thinking to create a record in the customer_type table with ID =
>>>>> -1. And in case the customer type is NULL, to assign it the value -1.
>>>>> That way, it will be always possible to do a full JOIN. I was
>>>>> wondering
>>>>> if that is good practice or rather nonsense ?
>>>>
>>>> Hm concerning the functionality of LEFT JOIN I do not really
>>>> understand what you wanna do
>>>> here. You created the customer_type table to have the possibility to
>>>> give more types to
>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The
>>>> statement could be like:
>>>>
>>>> SELECT name a
>>>> FROM customer a, cutomer_type b
>>>> WHERE a.id = b.id
>>>> AND b.type = 'super customer'
>>>>
>>>> Or not?
>>>>
>>>> Cheers
>>>>
>>>> Andy
>>>
>>>
>>> Hi Andreas,
>>>
>>> Thanks for the reply. I know the difference between JOIN and LEFT JOIN.
>>> The question was just if there is a performance hit between the 2 ?
>>>
>>> I never use constructs like this : SELECT name a
>>>> FROM customer a, cutomer_type b
>>>> WHERE a.id = b.id
>>>> AND b.type = 'super customer'
>>>
>>> Always Joins.
>>> I will try to clarify. The current situation is : the customer_type has
>>> no value, so = NULL, no problem with LEFT JOIN.
>>
>> Why do you need an empty entry in this table? Your application should
>> take care, that this
>> does not happen ...
>>
>>> But I'm wondering with tables that has thousands of records, if LEFT
>>> JOINS are performing well ?
>>
>> EXPLAIN ANALYZE with my examples:
>>
>> #1
>> test=# explain analyze select a.headline,a.content,b.content as
>> contentb from books a join
>> interests b on a.content = b.content;
>> QUERY PLAN
>>
>> -
>> ------------------------------------------------------------------------------------------------------------------
>>
>> Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual
>> time=0.051..0.058 rows=1 loops=1)
>> Hash Cond: (("outer".content)::text = ("inner".content)::text)
>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516)
>> (actual
>> time=0.007..0.010 rows=1 loops=1)
>> -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual
>> time=0.026..0.026 rows=3 loops=1)
>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032)
>> (actual
>> time=0.004..0.013 rows=3 loops=1)
>> Total runtime: 0.094 ms
>> (6 rows)
>>
>>
>> #2
>> test=# explain analyze select a.headline,a.content,b.content as
>> contentb from books a
>> left join interests b on a.content = b.content;
>> QUERY PLAN
>>
>> -
>> ----------------------------------------------------------------------------------------------------------------------
>>
>> Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual
>> time=0.038..0.062 rows=3
>> loops=1)
>> Hash Cond: (("outer".content)::text = ("inner".content)::text)
>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual
>> time=0.007..0.014 rows=3 loops=1)
>> -> Hash (cost=11.40..11.40 rows=140 width=516) (actual
>> time=0.015..0.015 rows=1 loops=1)
>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140
>> width=516) (actual
>> time=0.004..0.007 rows=1 loops=1)
>> Total runtime: 0.102 ms
>> (6 rows)
>>
>> I fired each statement ten times. But seriously - this is more or less
>> just a hint that a
>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it
>> depend on which execution
>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE
>> with your statements.
>>
>> Sorry for the big post with much output ...
>>
>> Cheers
>>
>> Andy
>>
>
>
> Hi Andy,
>
> Thanks again for the fast reply !
>
>> Why do you need an empty entry in this table? Your application should
>> take care, that this
>> does not happen ...
>
> That is the whole point of my doubts. When a user creates a new
> customer, it's not an obligation to enter a customer_type, and still
> many other fields, like tax, payment, yellow pages, payment method,
> etc... related things. So I always have lots of LEFT JOIN's in my
> queries, because of the NULL fields. You said the application should
> handle it. So you mean, even the field is not obligatory, you would put
> a value on it ? All lookup tables are of course linked through foreign
> keys to enforce data integrity. So what value would you enter ? Like 0,
> or -1 ?? And make sure there is a 0, or -1 record in the lookup table
> ? Otherwise the foreign key will complain.
> I think I almost get your point, just need a last little push :-)

Ah ok - now we are more in database/application design ;-). Here's another question: are
you really sure that the databse design is correct? Think about having all the info like
customer_type, tax, payment, yellow pages, payment method in the customer table also.
Menas - do you really need for these info more than one entry for one customer? So are you
sure you need the 1:n relation in all cases? I think not, so that would mean you put the
info about e.g tax, payment method and so on in the customer table as well and not in one
(or more ...? you were writing "all referenced tables") referenced tables. As a result you
will have less LEFT JOINS ...

Cheers

Andy

> Thanks a lot !
> Nico
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJgEvGVa7znmSP9AwRAtaZAJ4tRyx5hQ+/f6a1itFJjQygShXAGwCfai7a
7NpGhS3tegajf/i+NZ6VxF8=
=Dtb9
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Nico Callewaert 2009-01-28 12:21:33 Re: JOIN vs. LEFT JOIN
Previous Message Nico Callewaert 2009-01-28 11:59:35 Re: JOIN vs. LEFT JOIN