small table left outer join big table

From: Jie Li <jay23jack(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: small table left outer join big table
Date: 2010-12-28 10:13:40
Message-ID: AANLkTikNKpOfM=OPzfMgi9_q2tyUVSSPA0vBYaJ2_mk4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Please see the following plan:

postgres=# explain select * from small_table left outer join big_table using
(id);
QUERY PLAN
----------------------------------------------------------------------------
Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
Hash Cond: (small_table.id = big_table.id)
-> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
-> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
-> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
width=8)
(5 rows)

Here I have a puzzle, why not choose the small table to build hash table? It
can avoid multiple batches thus save significant I/O cost, isn't it?

We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can
be done with another flag bit

The only compromise is the output order, due to the two separate phases. Not
sure whether the SQL standard requires it.

Thanks,
Li Jie

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2010-12-28 10:23:43 Re: SQL/MED - core functionality
Previous Message Heikki Linnakangas 2010-12-28 09:59:09 Re: SQL/MED - core functionality