Skip site navigation (1) Skip section navigation (2)

Re: JOIN vs. LEFT JOIN

From: Andreas Wenk <a(dot)wenk(at)netzmeister-st-pauli(dot)de>
To: Nico Callewaert <callewaert(dot)nico(at)telenet(dot)be>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: JOIN vs. LEFT JOIN
Date: 2009-01-28 12:41:47
Message-ID: 4980528B.7040605@netzmeister-st-pauli.de (view raw or flat)
Thread:
Lists: pgsql-novice
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nico Callewaert schrieb:
>>>>>> Nico Callewaert schrieb:
>>>>>>> Hi !
>>>>>>>
>>>>>>> I heard that it is always better to use a full JOIN rather than a
>>>>>>> LEFT
>>>>>>> JOIN, for performance reasons.  Is that correct ?
>>>>>>
>>>>>> please note that a JOIN and a LEFT JOIN are tow different things.
>>>>>> Let's assume you have
>>>>>> two tables (without any sens):
>>>>>>
>>>>>> books:
>>>>>>
>>>>>> headline |   content
>>>>>> - ----------+-------------
>>>>>> politics | russia
>>>>>> politics | middle east
>>>>>> politics | germany
>>>>>>
>>>>>> interests:
>>>>>>
>>>>>> content
>>>>>> - ---------
>>>>>> germany
>>>>>>
>>>>>> Then fire two select statements like this:
>>>>>>
>>>>>> #1:
>>>>>> SELECT a.headline,a.content,b.content as contentb
>>>>>> FROM books a
>>>>>> JOIN interests b ON a.content = b.content;
>>>>>>
>>>>>>  headline | content | contentb
>>>>>> - ----------+---------+----------
>>>>>> politics | germany | germany
>>>>>> (1 row)
>>>>>>
>>>>>> #2:
>>>>>> SELECT a.headline,a.content,b.content as contentb
>>>>>> FROM books a
>>>>>> LEFT JOIN interests b on a.content = b.content;
>>>>>> headline |   content   | contentb
>>>>>> - ----------+-------------+----------
>>>>>> politics | russia      |
>>>>>> politics | middle east |
>>>>>> politics | germany     | germany
>>>>>> (3 rows)
>>>>>>
>>>>>>> But it's barely possible to use full JOINS all the time, since
>>>>>>> most of
>>>>>>> the lookup fields are not required.
>>>>>>> Example : in the customer table, the customer type is a looup field
>>>>>>> to a
>>>>>>> customer_type table.  But that is not required to post the record.
>>>>>>> So I
>>>>>>> was thinking to create a record in the customer_type table with ID =
>>>>>>> -1.  And in case the customer type is NULL, to assign it the
>>>>>>> value -1.
>>>>>>> That way, it will be always possible to do a full JOIN.  I was
>>>>>>> wondering
>>>>>>> if that is good practice or rather nonsense ?
>>>>>>
>>>>>> Hm concerning the functionality of LEFT JOIN I do not really
>>>>>> understand what you wanna do
>>>>>> here. You created the customer_type table to have the possibility to
>>>>>> give more types to
>>>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The
>>>>>> statement could be like:
>>>>>>
>>>>>> SELECT name a
>>>>>> FROM customer a, cutomer_type b
>>>>>> WHERE a.id = b.id
>>>>>> AND b.type = 'super customer'
>>>>>>
>>>>>> Or not?
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Andy
>>>>>
>>>>>
>>>>> Hi Andreas,
>>>>>
>>>>> Thanks for the reply.  I know the difference between JOIN and LEFT
>>>>> JOIN.
>>>>> The question was just if there is a performance hit between the 2 ?
>>>>>
>>>>> I never use constructs like this : SELECT name a
>>>>>> FROM customer a, cutomer_type b
>>>>>> WHERE a.id = b.id
>>>>>> AND b.type = 'super customer'
>>>>>
>>>>> Always Joins.
>>>>> I will try to clarify.  The current situation is : the
>>>>> customer_type has
>>>>> no value, so = NULL, no problem with LEFT JOIN.
>>>>
>>>> Why do you need an empty entry in this table? Your application should
>>>> take care, that this
>>>> does not happen ...
>>>>
>>>>> But I'm wondering with tables that has thousands of records, if LEFT
>>>>> JOINS are performing well ?
>>>>
>>>> EXPLAIN ANALYZE with my examples:
>>>>
>>>> #1
>>>> test=# explain analyze select a.headline,a.content,b.content as
>>>> contentb from books a join
>>>> interests b on a.content = b.content;
>>>>                                                    QUERY PLAN
>>>>
>>>> -
>>>> ------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> Hash Join  (cost=10.88..23.67 rows=70 width=1548) (actual
>>>> time=0.051..0.058 rows=1 loops=1)
>>>>   Hash Cond: (("outer".content)::text = ("inner".content)::text)
>>>>   ->  Seq Scan on interests b  (cost=0.00..11.40 rows=140 width=516)
>>>> (actual
>>>> time=0.007..0.010 rows=1 loops=1)
>>>>   ->  Hash  (cost=10.70..10.70 rows=70 width=1032) (actual
>>>> time=0.026..0.026 rows=3 loops=1)
>>>>         ->  Seq Scan on books a  (cost=0.00..10.70 rows=70 width=1032)
>>>> (actual
>>>> time=0.004..0.013 rows=3 loops=1)
>>>> Total runtime: 0.094 ms
>>>> (6 rows)
>>>>
>>>>
>>>> #2
>>>> test=# explain analyze select a.headline,a.content,b.content as
>>>> contentb from books a
>>>> left join interests b on a.content = b.content;
>>>>                                                      QUERY PLAN
>>>>
>>>> -
>>>> ----------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> Hash Left Join  (cost=11.75..23.72 rows=70 width=1548) (actual
>>>> time=0.038..0.062 rows=3
>>>> loops=1)
>>>>   Hash Cond: (("outer".content)::text = ("inner".content)::text)
>>>>   ->  Seq Scan on books a  (cost=0.00..10.70 rows=70 width=1032)
>>>> (actual
>>>> time=0.007..0.014 rows=3 loops=1)
>>>>   ->  Hash  (cost=11.40..11.40 rows=140 width=516) (actual
>>>> time=0.015..0.015 rows=1 loops=1)
>>>>         ->  Seq Scan on interests b  (cost=0.00..11.40 rows=140
>>>> width=516) (actual
>>>> time=0.004..0.007 rows=1 loops=1)
>>>> Total runtime: 0.102 ms
>>>> (6 rows)
>>>>
>>>> I fired each statement ten times. But seriously - this is more or less
>>>> just a hint that a
>>>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it
>>>> depend on which execution
>>>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE
>>>> with your statements.
>>>>
>>>> Sorry for the big post with much output ...
>>>>
>>>> Cheers
>>>>
>>>> Andy
>>>>
>>>
>>>
>>> Hi Andy,
>>>
>>> Thanks again for the fast reply !
>>>
>>>> Why do you need an empty entry in this table? Your application should
>>>> take care, that this
>>>> does not happen ...
>>>
>>> That is the whole point of my doubts.  When a user creates a new
>>> customer, it's not an obligation to enter a customer_type, and still
>>> many other fields, like tax, payment, yellow pages, payment method,
>>> etc... related things.  So I always have lots of LEFT JOIN's in my
>>> queries, because of the NULL fields.  You said the application should
>>> handle it.  So you mean, even the field is not obligatory, you would put
>>> a value on it ?  All lookup tables are of course linked through foreign
>>> keys to enforce data integrity. So what value would you enter ?  Like 0,
>>> or -1 ??  And make sure there is a 0, or -1 record in the lookup table
>>> ?  Otherwise the foreign key will complain.
>>> I think I almost get your point, just need a last little push :-)
>>
>> Ah ok - now we are more in database/application design ;-). Here's
>> another question: are
>> you really sure that the databse design is correct? Think about having
>> all the info like
>> customer_type, tax, payment, yellow pages, payment method in the
>> customer table also.
>> Menas - do you really need for these info more than one entry for one
>> customer? So are you
>> sure you need the 1:n relation in all cases? I think not, so that
>> would mean you put the
>> info about e.g tax, payment method and so on in the customer table as
>> well and not in one
>> (or more ...? you were writing "all referenced tables") referenced
>> tables. As a result you
>> will have less LEFT JOINS ...
>>
>> Cheers
>>
>> Andy
>>
> 
> Hi Andy,
> 
> I don't know what are the group policies here ? In case I have to cut a
> part of the message, just tell me, in case it gets too long.
> And in case it gets off topic, just send me a warning :-)

don't worry ;-)

> Well, yes, we need all those 1:n relationships, because customer type
> can be prospect, customer, not active.  Payment methods can be visa,
> cash, wire transfer, etc...  Yellow pages are the category numbers.  VAT
> rules can be 21%, 6%, etc...  All of these fields are feeded through

Ok please don't get me wrong for asking again: one customer can have two tax rates? Why?
Because he's living in the USA and has a taxrate of 25% and lives also in Germany and has
19%? Hm ... strange ... ;-) . And also the custumer is 'prospect' OR 'not active' ... but
not both ... ;-)

> drop downs in the application, so the customer can easily select a value.

So all these dropdowns are multiselect ...?

I am not sure if I can help you with this. My idea is really to think about the design of
the application and / or the database ... again - please don't get me wrong ...
> Nico
> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJgFKLVa7znmSP9AwRAlGkAJ9imdvkfk3iTw146Eru3BFB1llPZQCgqPXb
fYhH9zDowHAtjAYjfAHlhcM=
=Kpf5
-----END PGP SIGNATURE-----

In response to

Responses

pgsql-novice by date

Next:From: Andreas WenkDate: 2009-01-28 12:44:31
Subject: Re: JOIN vs. LEFT JOIN
Previous:From: Obe, ReginaDate: 2009-01-28 12:27:48
Subject: Re: JOIN vs. LEFT JOIN

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group