Four table join with million records - performance improvement?

From: Vijay Moses <vijay(dot)moses(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Four table join with million records - performance improvement?
Date: 2004-09-13 04:30:41
Message-ID: 17b0d951040912213055195d56@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi i have four sample tables ename, esal, edoj and esum
All of them have 1000000 records. Im running the following
query : select ename.eid, name, sal, doj, summary from
ename,esal,edoj,esum where ename.eid=esal.eid and ename.eid=edoj.eid
and ename.eid=esum.eid. Its a join of all four tables which returns
all 1 million records. The eid field in ename is a Primary Key and the
eid in all other tables are Foreign Keys. I have created an index for
all Foreign Keys. This query takes around 16 MINUTES to complete. Can
this time be reduced?
Thanks
Vijay

----------------------------------------------------------------

EXPLAIN OUTPUT

QUERY PLAN
Merge Join (cost=647497.97..163152572.97 rows=25000025000000 width=80)
Merge Cond: ("outer".eid = "inner".eid)
-> Merge Join (cost=356059.69..75361059.69 rows=5000000000 width=44)
Merge Cond: ("outer".eid = "inner".eid)
-> Sort (cost=150295.84..152795.84 rows=1000000 width=8)
Sort Key: edoj.eid
-> Seq Scan on edoj (cost=0.00..15568.00 rows=1000000 width=8)
-> Sort (cost=205763.84..208263.84 rows=1000000 width=36)
Sort Key: esum.eid
-> Seq Scan on esum (cost=0.00..31976.00 rows=1000000 width=36)
-> Sort (cost=291438.28..293938.29 rows=1000002 width=48)
Sort Key: ename.eid
-> Hash Join (cost=26683.01..107880.23 rows=1000002 width=48)
Hash Cond: ("outer".eid = "inner".eid)
-> Seq Scan on esal (cost=0.00..21613.01 rows=1000001 width=12)
-> Hash (cost=16370.01..16370.01 rows=1000001 width=36)
-> Seq Scan on ename (cost=0.00..16370.01
rows=1000001 width=36)

17 row(s)

Total runtime: 181.021 ms

----------------------------------------------------------------

EXPLAIN ANALYZE OUTPUT

QUERY PLAN

Merge Join (cost=647497.97..163152572.97 rows=25000025000000
width=80) (actual time=505418.965..584981.013 rows=1000000 loops=1)
Merge Cond: ("outer".eid = "inner".eid)
-> Merge Join (cost=356059.69..75361059.69 rows=5000000000
width=44) (actual time=110394.376..138177.569 rows=1000000 loops=1)
Merge Cond: ("outer".eid = "inner".eid)
-> Sort (cost=150295.84..152795.84 rows=1000000 width=8)
(actual time=27587.622..31077.077 rows=1000000 loops=1)
Sort Key: edoj.eid
-> Seq Scan on edoj (cost=0.00..15568.00 rows=1000000
width=8) (actual time=144.000..10445.145 rows=1000000 loops=1)
-> Sort (cost=205763.84..208263.84 rows=1000000 width=36)
(actual time=82806.646..90322.943 rows=1000000 loops=1)
Sort Key: esum.eid
-> Seq Scan on esum (cost=0.00..31976.00 rows=1000000
width=36) (actual time=20.312..29030.247 rows=1000000 loops=1)
-> Sort (cost=291438.28..293938.29 rows=1000002 width=48) (actual
time=395024.482..426870.491 rows=1000001 loops=1)
Sort Key: ename.eid
-> Hash Join (cost=26683.01..107880.23 rows=1000002
width=48) (actual time=29234.472..198064.105 rows=1000001 loops=1)
Hash Cond: ("outer".eid = "inner".eid)
-> Seq Scan on esal (cost=0.00..21613.01 rows=1000001
width=12) (actual time=32.257..23999.163 rows=1000001 loops=1)
-> Hash (cost=16370.01..16370.01 rows=1000001
width=36) (actual time=19362.095..19362.095 rows=0 loops=1)
-> Seq Scan on ename (cost=0.00..16370.01
rows=1000001 width=36) (actual time=26.744..13878.410 rows=1000001
loops=1)

Total runtime: 586226.831 ms

18 row(s)

Total runtime: 586,435.978 ms

----------------------------------------------------------------

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message mudfoot 2004-09-13 06:11:06 Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options
Previous Message George Essig 2004-09-13 03:03:57 Re: TSearch2 and optimisation ...