Skip site navigation (1) Skip section navigation (2)

How to improve speed of 3 table join &group (HUGE tables)

From: John Major <major(at)cbio(dot)mskcc(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: How to improve speed of 3 table join &group (HUGE tables)
Date: 2007-10-18 17:01:01
Message-ID: 4717914D.1050704@cbio.mskcc.org (view raw or flat)
Thread:
Lists: pgsql-performance
I am trying to join three quite large tables, and the query is 
unbearably slow(meaning I can't get results in more than a day of 
processing).
I've tried the basic optimizations I understand, and nothing has 
improved the execute speed.... any help with this would be greatly 
appreciated


The three tables are quite large:
  sequence_fragment = 4.5 million rows
  sequence_external_info = 10million rows
  sequence_alignment = 500 million rows


The query I am attempting to run is this:

select sf.library_id, fio.clip_type , count(distinct(sa.sequence_id))
from   sequence_alignment sa,  sequence_fragment sf, 
fragment_external_info fio
where sf.seq_frag_id = fio.sequence_frag_id
and  sf.sequence_id = sa.sequence_id
group by sf.library_id, fio.clip_type


NOTES:
~there are indexes on all of the fields being joined (but not on 
library_id or clip_type ). 
~Everything has been re-analyzed post index creation
~I've tried "set enable_seqscan=off" and set (join_table_order or 
something) = 1

The explain plan is as follows:

 QUERY 
PLAN                                                                                                                                              

 ------------------------------------------------------------------------------------------------------------------------------------------------------- 

 GroupAggregate  (cost=1443436673.93..1480593403.29 rows=54 
width=16)                                                                                    

   ->  Sort  (cost=1443436673.93..1452725856.10 rows=3715672868 
width=16)                                                                                

         Sort Key: sf.library_id, 
fio.clip_type                                                                                                          

         ->  Merge Join  (cost=263624049.25..319410068.18 
rows=3715672868 
width=16)                                                                      

               Merge Cond: (sf.sequence_id = 
sa.sequence_id)                                                                                             

               ->  Sort  (cost=38102888.77..38128373.54 rows=10193906 
width=16)                                                                          

                     Sort Key: 
sf.sequence_id                                                                                                            

                     ->  Hash Join  (cost=5305576.14..36080036.76 
rows=10193906 
width=16)                                                               
                           Hash Cond: (fio.sequence_frag_id = 
sf.seq_frag_id)                                                                            

                           ->  Index Scan using 
frag_ext_info_seq_frag_id on fragment_external_info fio  
(cost=0.00..30450510.27 rows=10193906 width=12)
                           ->  Hash  (cost=5223807.54..5223807.54 
rows=4453728 
width=12)                                                                
                                 ->  Index Scan using seq_frag_seqid_ind 
on sequence_fragment sf  (cost=0.00..5223807.54 rows=4453728 
width=12)         
               ->  Sort  (cost=225521160.48..226688766.88 rows=467042560 
width=4)                                                                        

                     Sort Key: 
sa.sequence_id                                                                                                            

                     ->  Seq Scan on sequence_alignment sa  
(cost=100000000.00..110379294.60 rows=467042560 
width=4)                                    

 15 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]


Thanks in advance!
John Major


Responses

pgsql-performance by date

Next:From: Nis JørgensenDate: 2007-10-18 18:58:17
Subject: Re: How to improve speed of 3 table join &group (HUGE tables)
Previous:From: Nis JørgensenDate: 2007-10-18 09:29:08
Subject: Re: Incorrect estimates on columns

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group