Re: JOIN vs. LEFT JOIN

From: Andreas Wenk <a(dot)wenk(at)netzmeister-st-pauli(dot)de>
To: Nico Callewaert <callewaert(dot)nico(at)telenet(dot)be>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: JOIN vs. LEFT JOIN
Date: 2009-01-28 12:41:47
Message-ID: 4980528B.7040605@netzmeister-st-pauli.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nico Callewaert schrieb:
>>>>>> Nico Callewaert schrieb:
>>>>>>> Hi !
>>>>>>>
>>>>>>> I heard that it is always better to use a full JOIN rather than a
>>>>>>> LEFT
>>>>>>> JOIN, for performance reasons. Is that correct ?
>>>>>>
>>>>>> please note that a JOIN and a LEFT JOIN are tow different things.
>>>>>> Let's assume you have
>>>>>> two tables (without any sens):
>>>>>>
>>>>>> books:
>>>>>>
>>>>>> headline | content
>>>>>> - ----------+-------------
>>>>>> politics | russia
>>>>>> politics | middle east
>>>>>> politics | germany
>>>>>>
>>>>>> interests:
>>>>>>
>>>>>> content
>>>>>> - ---------
>>>>>> germany
>>>>>>
>>>>>> Then fire two select statements like this:
>>>>>>
>>>>>> #1:
>>>>>> SELECT a.headline,a.content,b.content as contentb
>>>>>> FROM books a
>>>>>> JOIN interests b ON a.content = b.content;
>>>>>>
>>>>>> headline | content | contentb
>>>>>> - ----------+---------+----------
>>>>>> politics | germany | germany
>>>>>> (1 row)
>>>>>>
>>>>>> #2:
>>>>>> SELECT a.headline,a.content,b.content as contentb
>>>>>> FROM books a
>>>>>> LEFT JOIN interests b on a.content = b.content;
>>>>>> headline | content | contentb
>>>>>> - ----------+-------------+----------
>>>>>> politics | russia |
>>>>>> politics | middle east |
>>>>>> politics | germany | germany
>>>>>> (3 rows)
>>>>>>
>>>>>>> But it's barely possible to use full JOINS all the time, since
>>>>>>> most of
>>>>>>> the lookup fields are not required.
>>>>>>> Example : in the customer table, the customer type is a looup field
>>>>>>> to a
>>>>>>> customer_type table. But that is not required to post the record.
>>>>>>> So I
>>>>>>> was thinking to create a record in the customer_type table with ID =
>>>>>>> -1. And in case the customer type is NULL, to assign it the
>>>>>>> value -1.
>>>>>>> That way, it will be always possible to do a full JOIN. I was
>>>>>>> wondering
>>>>>>> if that is good practice or rather nonsense ?
>>>>>>
>>>>>> Hm concerning the functionality of LEFT JOIN I do not really
>>>>>> understand what you wanna do
>>>>>> here. You created the customer_type table to have the possibility to
>>>>>> give more types to
>>>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The
>>>>>> statement could be like:
>>>>>>
>>>>>> SELECT name a
>>>>>> FROM customer a, cutomer_type b
>>>>>> WHERE a.id = b.id
>>>>>> AND b.type = 'super customer'
>>>>>>
>>>>>> Or not?
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Andy
>>>>>
>>>>>
>>>>> Hi Andreas,
>>>>>
>>>>> Thanks for the reply. I know the difference between JOIN and LEFT
>>>>> JOIN.
>>>>> The question was just if there is a performance hit between the 2 ?
>>>>>
>>>>> I never use constructs like this : SELECT name a
>>>>>> FROM customer a, cutomer_type b
>>>>>> WHERE a.id = b.id
>>>>>> AND b.type = 'super customer'
>>>>>
>>>>> Always Joins.
>>>>> I will try to clarify. The current situation is : the
>>>>> customer_type has
>>>>> no value, so = NULL, no problem with LEFT JOIN.
>>>>
>>>> Why do you need an empty entry in this table? Your application should
>>>> take care, that this
>>>> does not happen ...
>>>>
>>>>> But I'm wondering with tables that has thousands of records, if LEFT
>>>>> JOINS are performing well ?
>>>>
>>>> EXPLAIN ANALYZE with my examples:
>>>>
>>>> #1
>>>> test=# explain analyze select a.headline,a.content,b.content as
>>>> contentb from books a join
>>>> interests b on a.content = b.content;
>>>> QUERY PLAN
>>>>
>>>> -
>>>> ------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual
>>>> time=0.051..0.058 rows=1 loops=1)
>>>> Hash Cond: (("outer".content)::text = ("inner".content)::text)
>>>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516)
>>>> (actual
>>>> time=0.007..0.010 rows=1 loops=1)
>>>> -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual
>>>> time=0.026..0.026 rows=3 loops=1)
>>>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032)
>>>> (actual
>>>> time=0.004..0.013 rows=3 loops=1)
>>>> Total runtime: 0.094 ms
>>>> (6 rows)
>>>>
>>>>
>>>> #2
>>>> test=# explain analyze select a.headline,a.content,b.content as
>>>> contentb from books a
>>>> left join interests b on a.content = b.content;
>>>> QUERY PLAN
>>>>
>>>> -
>>>> ----------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual
>>>> time=0.038..0.062 rows=3
>>>> loops=1)
>>>> Hash Cond: (("outer".content)::text = ("inner".content)::text)
>>>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032)
>>>> (actual
>>>> time=0.007..0.014 rows=3 loops=1)
>>>> -> Hash (cost=11.40..11.40 rows=140 width=516) (actual
>>>> time=0.015..0.015 rows=1 loops=1)
>>>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140
>>>> width=516) (actual
>>>> time=0.004..0.007 rows=1 loops=1)
>>>> Total runtime: 0.102 ms
>>>> (6 rows)
>>>>
>>>> I fired each statement ten times. But seriously - this is more or less
>>>> just a hint that a
>>>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it
>>>> depend on which execution
>>>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE
>>>> with your statements.
>>>>
>>>> Sorry for the big post with much output ...
>>>>
>>>> Cheers
>>>>
>>>> Andy
>>>>
>>>
>>>
>>> Hi Andy,
>>>
>>> Thanks again for the fast reply !
>>>
>>>> Why do you need an empty entry in this table? Your application should
>>>> take care, that this
>>>> does not happen ...
>>>
>>> That is the whole point of my doubts. When a user creates a new
>>> customer, it's not an obligation to enter a customer_type, and still
>>> many other fields, like tax, payment, yellow pages, payment method,
>>> etc... related things. So I always have lots of LEFT JOIN's in my
>>> queries, because of the NULL fields. You said the application should
>>> handle it. So you mean, even the field is not obligatory, you would put
>>> a value on it ? All lookup tables are of course linked through foreign
>>> keys to enforce data integrity. So what value would you enter ? Like 0,
>>> or -1 ?? And make sure there is a 0, or -1 record in the lookup table
>>> ? Otherwise the foreign key will complain.
>>> I think I almost get your point, just need a last little push :-)
>>
>> Ah ok - now we are more in database/application design ;-). Here's
>> another question: are
>> you really sure that the databse design is correct? Think about having
>> all the info like
>> customer_type, tax, payment, yellow pages, payment method in the
>> customer table also.
>> Menas - do you really need for these info more than one entry for one
>> customer? So are you
>> sure you need the 1:n relation in all cases? I think not, so that
>> would mean you put the
>> info about e.g tax, payment method and so on in the customer table as
>> well and not in one
>> (or more ...? you were writing "all referenced tables") referenced
>> tables. As a result you
>> will have less LEFT JOINS ...
>>
>> Cheers
>>
>> Andy
>>
>
> Hi Andy,
>
> I don't know what are the group policies here ? In case I have to cut a
> part of the message, just tell me, in case it gets too long.
> And in case it gets off topic, just send me a warning :-)

don't worry ;-)

> Well, yes, we need all those 1:n relationships, because customer type
> can be prospect, customer, not active. Payment methods can be visa,
> cash, wire transfer, etc... Yellow pages are the category numbers. VAT
> rules can be 21%, 6%, etc... All of these fields are feeded through

Ok please don't get me wrong for asking again: one customer can have two tax rates? Why?
Because he's living in the USA and has a taxrate of 25% and lives also in Germany and has
19%? Hm ... strange ... ;-) . And also the custumer is 'prospect' OR 'not active' ... but
not both ... ;-)

> drop downs in the application, so the customer can easily select a value.

So all these dropdowns are multiselect ...?

I am not sure if I can help you with this. My idea is really to think about the design of
the application and / or the database ... again - please don't get me wrong ...
> Nico
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJgFKLVa7znmSP9AwRAlGkAJ9imdvkfk3iTw146Eru3BFB1llPZQCgqPXb
fYhH9zDowHAtjAYjfAHlhcM=
=Kpf5
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Andreas Wenk 2009-01-28 12:44:31 Re: JOIN vs. LEFT JOIN
Previous Message Obe, Regina 2009-01-28 12:27:48 Re: JOIN vs. LEFT JOIN