Re: JOIN vs. LEFT JOIN

From: "Nico Callewaert" <callewaert(dot)nico(at)telenet(dot)be>
To: "Andreas Wenk" <a(dot)wenk(at)netzmeister-st-pauli(dot)de>
Cc: <pgsql-novice(at)postgresql(dot)org>
Subject: Re: JOIN vs. LEFT JOIN
Date: 2009-01-28 12:21:33
Message-ID: 5A5C5FB6FC174396A1B165B2B282526A@etsinformatics.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

>>>>> Nico Callewaert schrieb:
>>>>>> Hi !
>>>>>>
>>>>>> I heard that it is always better to use a full JOIN rather than a
>>>>>> LEFT
>>>>>> JOIN, for performance reasons. Is that correct ?
>>>>>
>>>>> please note that a JOIN and a LEFT JOIN are tow different things.
>>>>> Let's assume you have
>>>>> two tables (without any sens):
>>>>>
>>>>> books:
>>>>>
>>>>> headline | content
>>>>> - ----------+-------------
>>>>> politics | russia
>>>>> politics | middle east
>>>>> politics | germany
>>>>>
>>>>> interests:
>>>>>
>>>>> content
>>>>> - ---------
>>>>> germany
>>>>>
>>>>> Then fire two select statements like this:
>>>>>
>>>>> #1:
>>>>> SELECT a.headline,a.content,b.content as contentb
>>>>> FROM books a
>>>>> JOIN interests b ON a.content = b.content;
>>>>>
>>>>> headline | content | contentb
>>>>> - ----------+---------+----------
>>>>> politics | germany | germany
>>>>> (1 row)
>>>>>
>>>>> #2:
>>>>> SELECT a.headline,a.content,b.content as contentb
>>>>> FROM books a
>>>>> LEFT JOIN interests b on a.content = b.content;
>>>>> headline | content | contentb
>>>>> - ----------+-------------+----------
>>>>> politics | russia |
>>>>> politics | middle east |
>>>>> politics | germany | germany
>>>>> (3 rows)
>>>>>
>>>>>> But it's barely possible to use full JOINS all the time, since most
>>>>>> of
>>>>>> the lookup fields are not required.
>>>>>> Example : in the customer table, the customer type is a looup field
>>>>>> to a
>>>>>> customer_type table. But that is not required to post the record.
>>>>>> So I
>>>>>> was thinking to create a record in the customer_type table with ID =
>>>>>> -1. And in case the customer type is NULL, to assign it the
>>>>>> value -1.
>>>>>> That way, it will be always possible to do a full JOIN. I was
>>>>>> wondering
>>>>>> if that is good practice or rather nonsense ?
>>>>>
>>>>> Hm concerning the functionality of LEFT JOIN I do not really
>>>>> understand what you wanna do
>>>>> here. You created the customer_type table to have the possibility to
>>>>> give more types to
>>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The
>>>>> statement could be like:
>>>>>
>>>>> SELECT name a
>>>>> FROM customer a, cutomer_type b
>>>>> WHERE a.id = b.id
>>>>> AND b.type = 'super customer'
>>>>>
>>>>> Or not?
>>>>>
>>>>> Cheers
>>>>>
>>>>> Andy
>>>>
>>>>
>>>> Hi Andreas,
>>>>
>>>> Thanks for the reply. I know the difference between JOIN and LEFT
>>>> JOIN.
>>>> The question was just if there is a performance hit between the 2 ?
>>>>
>>>> I never use constructs like this : SELECT name a
>>>>> FROM customer a, cutomer_type b
>>>>> WHERE a.id = b.id
>>>>> AND b.type = 'super customer'
>>>>
>>>> Always Joins.
>>>> I will try to clarify. The current situation is : the customer_type
>>>> has
>>>> no value, so = NULL, no problem with LEFT JOIN.
>>>
>>> Why do you need an empty entry in this table? Your application should
>>> take care, that this
>>> does not happen ...
>>>
>>>> But I'm wondering with tables that has thousands of records, if LEFT
>>>> JOINS are performing well ?
>>>
>>> EXPLAIN ANALYZE with my examples:
>>>
>>> #1
>>> test=# explain analyze select a.headline,a.content,b.content as
>>> contentb from books a join
>>> interests b on a.content = b.content;
>>> QUERY PLAN
>>>
>>> -
>>> ------------------------------------------------------------------------------------------------------------------
>>>
>>> Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual
>>> time=0.051..0.058 rows=1 loops=1)
>>> Hash Cond: (("outer".content)::text = ("inner".content)::text)
>>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516)
>>> (actual
>>> time=0.007..0.010 rows=1 loops=1)
>>> -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual
>>> time=0.026..0.026 rows=3 loops=1)
>>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032)
>>> (actual
>>> time=0.004..0.013 rows=3 loops=1)
>>> Total runtime: 0.094 ms
>>> (6 rows)
>>>
>>>
>>> #2
>>> test=# explain analyze select a.headline,a.content,b.content as
>>> contentb from books a
>>> left join interests b on a.content = b.content;
>>> QUERY PLAN
>>>
>>> -
>>> ----------------------------------------------------------------------------------------------------------------------
>>>
>>> Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual
>>> time=0.038..0.062 rows=3
>>> loops=1)
>>> Hash Cond: (("outer".content)::text = ("inner".content)::text)
>>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual
>>> time=0.007..0.014 rows=3 loops=1)
>>> -> Hash (cost=11.40..11.40 rows=140 width=516) (actual
>>> time=0.015..0.015 rows=1 loops=1)
>>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140
>>> width=516) (actual
>>> time=0.004..0.007 rows=1 loops=1)
>>> Total runtime: 0.102 ms
>>> (6 rows)
>>>
>>> I fired each statement ten times. But seriously - this is more or less
>>> just a hint that a
>>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it
>>> depend on which execution
>>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE
>>> with your statements.
>>>
>>> Sorry for the big post with much output ...
>>>
>>> Cheers
>>>
>>> Andy
>>>
>>
>>
>> Hi Andy,
>>
>> Thanks again for the fast reply !
>>
>>> Why do you need an empty entry in this table? Your application should
>>> take care, that this
>>> does not happen ...
>>
>> That is the whole point of my doubts. When a user creates a new
>> customer, it's not an obligation to enter a customer_type, and still
>> many other fields, like tax, payment, yellow pages, payment method,
>> etc... related things. So I always have lots of LEFT JOIN's in my
>> queries, because of the NULL fields. You said the application should
>> handle it. So you mean, even the field is not obligatory, you would put
>> a value on it ? All lookup tables are of course linked through foreign
>> keys to enforce data integrity. So what value would you enter ? Like 0,
>> or -1 ?? And make sure there is a 0, or -1 record in the lookup table
>> ? Otherwise the foreign key will complain.
>> I think I almost get your point, just need a last little push :-)
>
> Ah ok - now we are more in database/application design ;-). Here's another
> question: are
> you really sure that the databse design is correct? Think about having all
> the info like
> customer_type, tax, payment, yellow pages, payment method in the customer
> table also.
> Menas - do you really need for these info more than one entry for one
> customer? So are you
> sure you need the 1:n relation in all cases? I think not, so that would
> mean you put the
> info about e.g tax, payment method and so on in the customer table as well
> and not in one
> (or more ...? you were writing "all referenced tables") referenced tables.
> As a result you
> will have less LEFT JOINS ...
>
> Cheers
>
> Andy
>

Hi Andy,

I don't know what are the group policies here ? In case I have to cut a part
of the message, just tell me, in case it gets too long.
And in case it gets off topic, just send me a warning :-)

Well, yes, we need all those 1:n relationships, because customer type can be
prospect, customer, not active. Payment methods can be visa, cash, wire
transfer, etc... Yellow pages are the category numbers. VAT rules can be
21%, 6%, etc... All of these fields are feeded through drop downs in the
application, so the customer can easily select a value.

Nico

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Obe, Regina 2009-01-28 12:27:48 Re: JOIN vs. LEFT JOIN
Previous Message Andreas Wenk 2009-01-28 12:12:54 Re: JOIN vs. LEFT JOIN