| From: | Chris Travers <chris(at)travelamericas(dot)com> | 
|---|---|
| To: | pgsql-performance(at)postgresql(dot)org | 
| Subject: | Left joining against two empty tables makes a query SLOW | 
| Date: | 2005-07-28 05:15:47 | 
| Message-ID: | 42E86A03.1070803@travelamericas.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
Hi all;
I have a customer who currently uses an application which had become 
slow.  After doing some digging, I found the slow query:
SELECT c.accno, c.description, c.link, c.category, ac.project_id, 
p.projectnumber,
        a.department_id, d.description AS department
FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id)
        JOIN ar a ON (a.id = ac.trans_id)
        LEFT JOIN project p ON (ac.project_id = p.id)
        LEFT JOIN department d ON (d.id = a.department_id)
WHERE a.customer_id = 11373 AND a.id IN (
        SELECT max(id) FROM ar WHERE customer_id = 11373
);
(reformatted for readability)
This is taking 10 seconds to run.
Interestingly, both the project and department tables are blank, and if 
I omit them, the query becomes:
SELECT c.accno, c.description, c.link, c.category, ac.project_id
FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id)
        JOIN ar a ON (a.id = ac.trans_id)
WHERE a.customer_id = 11373 AND a.id IN (
        SELECT max(id) FROM ar WHERE customer_id = 11373
);
This takes 139ms. 1% of the previous query.
The plan for the long query is:
                                                                           
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash IN Join  (cost=87337.25..106344.93 rows=41 width=118) (actual 
time=7615.843..9850.209 rows=10 loops=1)
   Hash Cond: ("outer".trans_id = "inner".max)
   ->  Merge Right Join  (cost=86620.57..100889.85 rows=947598 
width=126) (actual time=7408.830..9200.435 rows=177769 loops=1)
         Merge Cond: ("outer".id = "inner".department_id)
         ->  Index Scan using department_id_key on department d  
(cost=0.00..52.66
rows=1060 width=36) (actual time=0.090..0.090 rows=0 loops=1)
         ->  Sort  (cost=86620.57..87067.55 rows=178792 width=94) 
(actual time=7408.709..7925.843 rows=177769 loops=1)
               Sort Key: a.department_id
               ->  Merge Right Join  (cost=45871.18..46952.83 
rows=178792 width=94) (actual time=4962.122..6671.319 rows=177769 loops=1)
                     Merge Cond: ("outer".id = "inner".project_id)
                     ->  Index Scan using project_id_key on project p  
(cost=0.00..49.80 rows=800 width=36) (actual time=0.007..0.007 rows=0 
loops=1)
                     ->  Sort  (cost=45871.18..46318.16 rows=178792 
width=62) (actual time=4962.084..5475.636 rows=177769 loops=1)
                           Sort Key: ac.project_id
                           ->  Hash Join  (cost=821.20..13193.43 
rows=178792 width=62) (actual time=174.905..4295.685 rows=177769 loops=1)
                                 Hash Cond: ("outer".chart_id = "inner".id)
                                 ->  Hash Join  (cost=817.66..10508.02 
rows=178791
width=20) (actual time=173.952..2840.824 rows=177769 loops=1)
                                       Hash Cond: ("outer".trans_id = 
"inner".id)
                                       ->  Seq Scan on acc_trans ac  
(cost=0.00..3304.38 rows=181538 width=12) (actual time=0.062..537.753 
rows=181322 loops=1)
                                       ->  Hash  (cost=659.55..659.55 
rows=22844 width=8) (actual time=173.625..173.625 rows=0 loops=1)
                                             ->  Seq Scan on ar a  
(cost=0.00..659.55 rows=22844 width=8) (actual time=0.022..101.828 
rows=22844 loops=1)
                                                   Filter: (customer_id 
= 11373)
                                 ->  Hash  (cost=3.23..3.23 rows=123 
width=50) (actual time=0.915..0.915 rows=0 loops=1)
                                       ->  Seq Scan on chart c  
(cost=0.00..3.23 rows=123 width=50) (actual time=0.013..0.528 rows=123 
loops=1)
   ->  Hash  (cost=716.67..716.67 rows=1 width=4) (actual 
time=129.037..129.037 rows=0 loops=1)
         ->  Subquery Scan "IN_subquery"  (cost=716.66..716.67 rows=1 
width=4) (actual time=129.017..129.025 rows=1 loops=1)
               ->  Aggregate  (cost=716.66..716.66 rows=1 width=4) 
(actual time=129.008..129.011 rows=1 loops=1)
                     ->  Seq Scan on ar  (cost=0.00..659.55 rows=22844 
width=4) (actual time=0.020..73.266 rows=22844 loops=1)
                           Filter: (customer_id = 11373)
 Total runtime: 9954.133 ms
(28 rows)
 
The shorter query's plan is:
                                                                       
QUERY PLAN
                                                                      
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=728.42..732.96 rows=8 width=50) (actual 
time=130.908..131.593 rows=10 loops=1)
   Hash Cond: ("outer".id = "inner".chart_id)
   ->  Seq Scan on chart c  (cost=0.00..3.23 rows=123 width=50) (actual 
time=0.006..0.361 rows=123 loops=1)
   ->  Hash  (cost=728.40..728.40 rows=8 width=8) (actual 
time=130.841..130.841 rows=0 loops=1)
         ->  Nested Loop  (cost=716.67..728.40 rows=8 width=8) (actual 
time=130.692..130.805 rows=10 loops=1)
               ->  Nested Loop  (cost=716.67..720.89 rows=1 width=8) 
(actual time=130.626..130.639 rows=1 loops=1)
                     ->  HashAggregate  (cost=716.67..716.67 rows=1 
width=4) (actual time=130.484..130.487 rows=1 loops=1)
                           ->  Subquery Scan "IN_subquery"  
(cost=716.66..716.67 rows=1 width=4) (actual time=130.455..130.464 
rows=1 loops=1)
                                 ->  Aggregate  (cost=716.66..716.66 
rows=1 width=4) (actual time=130.445..130.448 rows=1 loops=1)
                                       ->  Seq Scan on ar  
(cost=0.00..659.55 rows=22844 width=4) (actual time=0.020..74.174 
rows=22844 loops=1)
                                             Filter: (customer_id = 11373)
                     ->  Index Scan using ar_id_key on ar a  
(cost=0.00..4.20 rows=1 width=4) (actual time=0.122..0.125 rows=1 loops=1)
                           Index Cond: (a.id = "outer".max)
                           Filter: (customer_id = 11373)
               ->  Index Scan using acc_trans_trans_id_key on acc_trans 
ac  (cost=0.00..7.41 rows=8 width=12) (actual time=0.051..0.097 rows=10 
loops=1)
                     Index Cond: ("outer".max = ac.trans_id)
 Total runtime: 131.879 ms
(17 rows)
 
I am not sure if I want to remove support for the other two tables 
yet.   However, I wanted to submit this here as a (possibly corner-) 
case where the plan seems to be far slower than it needs to be.
Best Wishes,
Chris Travers
Metatron Technology Consulting
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2005-07-28 06:19:19 | Re: Performance problems testing with Spamassassin 3.1.0 Bayes module. | 
| Previous Message | Matthew Schumacher | 2005-07-28 02:17:05 | Re: Performance problems testing with Spamassassin 3.1.0 |