Skip site navigation (1) Skip section navigation (2)

Four table join with million records - performance improvement?

From: Vijay Moses <vijay(dot)moses(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Four table join with million records - performance improvement?
Date: 2004-09-13 04:30:41
Message-ID: 17b0d951040912213055195d56@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi i have four sample tables ename, esal, edoj and esum
All of them have 1000000 records. Im running the following
query : select ename.eid, name, sal, doj, summary from
ename,esal,edoj,esum where ename.eid=esal.eid and ename.eid=edoj.eid
and ename.eid=esum.eid. Its a join of all four tables which returns
all 1 million records. The eid field in ename is a Primary Key and the
eid in all other tables are Foreign Keys. I have created an index for
all Foreign Keys. This query takes around 16 MINUTES to complete. Can
this time be reduced?
Thanks
Vijay

----------------------------------------------------------------

EXPLAIN OUTPUT

QUERY PLAN 
Merge Join  (cost=647497.97..163152572.97 rows=25000025000000 width=80) 
  Merge Cond: ("outer".eid = "inner".eid) 
  ->  Merge Join  (cost=356059.69..75361059.69 rows=5000000000 width=44) 
        Merge Cond: ("outer".eid = "inner".eid) 
        ->  Sort  (cost=150295.84..152795.84 rows=1000000 width=8) 
              Sort Key: edoj.eid 
              ->  Seq Scan on edoj  (cost=0.00..15568.00 rows=1000000 width=8) 
        ->  Sort  (cost=205763.84..208263.84 rows=1000000 width=36) 
              Sort Key: esum.eid 
              ->  Seq Scan on esum  (cost=0.00..31976.00 rows=1000000 width=36) 
  ->  Sort  (cost=291438.28..293938.29 rows=1000002 width=48) 
        Sort Key: ename.eid 
        ->  Hash Join  (cost=26683.01..107880.23 rows=1000002 width=48) 
              Hash Cond: ("outer".eid = "inner".eid) 
              ->  Seq Scan on esal  (cost=0.00..21613.01 rows=1000001 width=12) 
              ->  Hash  (cost=16370.01..16370.01 rows=1000001 width=36) 
                    ->  Seq Scan on ename  (cost=0.00..16370.01
rows=1000001 width=36)

17 row(s)

Total runtime: 181.021 ms

----------------------------------------------------------------

EXPLAIN ANALYZE OUTPUT

QUERY PLAN 

Merge Join  (cost=647497.97..163152572.97 rows=25000025000000
width=80) (actual time=505418.965..584981.013 rows=1000000 loops=1)
  Merge Cond: ("outer".eid = "inner".eid) 
  ->  Merge Join  (cost=356059.69..75361059.69 rows=5000000000
width=44) (actual time=110394.376..138177.569 rows=1000000 loops=1)
        Merge Cond: ("outer".eid = "inner".eid) 
        ->  Sort  (cost=150295.84..152795.84 rows=1000000 width=8)
(actual time=27587.622..31077.077 rows=1000000 loops=1)
              Sort Key: edoj.eid 
              ->  Seq Scan on edoj  (cost=0.00..15568.00 rows=1000000
width=8) (actual time=144.000..10445.145 rows=1000000 loops=1)
        ->  Sort  (cost=205763.84..208263.84 rows=1000000 width=36)
(actual time=82806.646..90322.943 rows=1000000 loops=1)
              Sort Key: esum.eid 
              ->  Seq Scan on esum  (cost=0.00..31976.00 rows=1000000
width=36) (actual time=20.312..29030.247 rows=1000000 loops=1)
  ->  Sort  (cost=291438.28..293938.29 rows=1000002 width=48) (actual
time=395024.482..426870.491 rows=1000001 loops=1)
        Sort Key: ename.eid 
        ->  Hash Join  (cost=26683.01..107880.23 rows=1000002
width=48) (actual time=29234.472..198064.105 rows=1000001 loops=1)
              Hash Cond: ("outer".eid = "inner".eid) 
              ->  Seq Scan on esal  (cost=0.00..21613.01 rows=1000001
width=12) (actual time=32.257..23999.163 rows=1000001 loops=1)
              ->  Hash  (cost=16370.01..16370.01 rows=1000001
width=36) (actual time=19362.095..19362.095 rows=0 loops=1)
                    ->  Seq Scan on ename  (cost=0.00..16370.01
rows=1000001 width=36) (actual time=26.744..13878.410 rows=1000001
loops=1)

Total runtime: 586226.831 ms 

18 row(s)

Total runtime: 586,435.978 ms

----------------------------------------------------------------

Responses

pgsql-performance by date

Next:From: mudfootDate: 2004-09-13 06:11:06
Subject: Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options
Previous:From: George EssigDate: 2004-09-13 03:03:57
Subject: Re: TSearch2 and optimisation ...

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group