Re: Left joining against two empty tables makes a query SLOW

From: Gnanavel S <s(dot)gnanavel(at)gmail(dot)com>
To: Chris Travers <chris(at)travelamericas(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Left joining against two empty tables makes a query SLOW
Date: 2005-07-28 09:55:57
Message-ID: eec3b03c050728025527b91961@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 7/28/05, Chris Travers <chris(at)travelamericas(dot)com> wrote:
>
> Hi all;
>
> I have a customer who currently uses an application which had become
> slow. After doing some digging, I found the slow query:
>
> SELECT c.accno, c.description, c.link, c.category, ac.project_id,
> p.projectnumber,
> a.department_id, d.description AS department
> FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id <http://c.id>)
> JOIN ar a ON (a.id <http://a.id> = ac.trans_id)
> LEFT JOIN project p ON (ac.project_id = p.id <http://p.id>)
> LEFT JOIN department d ON (d.id <http://d.id> = a.department_id)
> WHERE a.customer_id = 11373 AND a.id <http://a.id> IN (
> SELECT max(id) FROM ar WHERE customer_id = 11373
> );
>
> (reformatted for readability)
>
> This is taking 10 seconds to run.
>
> Interestingly, both the project and department tables are blank, and if
> I omit them, the query becomes:
> SELECT c.accno, c.description, c.link, c.category, ac.project_id
> FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id <http://c.id>)
> JOIN ar a ON (a.id <http://a.id> = ac.trans_id)
> WHERE a.customer_id = 11373 AND a.id <http://a.id> IN (
> SELECT max(id) FROM ar WHERE customer_id = 11373
> );
>
> This takes 139ms. 1% of the previous query.
>
> The plan for the long query is:
>
>
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Hash IN Join (cost=87337.25..106344.93 rows=41 width=118) (actual
> time=7615.843..9850.209 rows=10 loops=1)
> Hash Cond: ("outer".trans_id = "inner".max)
> -> Merge Right Join (cost=86620.57..100889.85 rows=947598
> width=126) (actual time=7408.830..9200.435 rows=177769 loops=1)
> Merge Cond: ("outer".id = "inner".department_id)
> -> Index Scan using department_id_key on department d
> (cost=0.00..52.66
> rows=1060 width=36) (actual time=0.090..0.090 rows=0 loops=1)

vacuum & reindex the department and project table as the planner expects
there are 1060 rows but actually returning nothing.

-> Sort (cost=86620.57..87067.55 rows=178792 width=94)
> (actual time=7408.709..7925.843 rows=177769 loops=1)
> Sort Key: a.department_id
> -> Merge Right Join (cost=45871.18..46952.83
> rows=178792 width=94) (actual time=4962.122..6671.319 rows=177769 loops=1)
> Merge Cond: ("outer".id = "inner".project_id)
> -> Index Scan using project_id_key on project p
> (cost=0.00..49.80 rows=800 width=36) (actual time=0.007..0.007 rows=0
> loops=1)
> -> Sort (cost=45871.18..46318.16 rows=178792
> width=62) (actual time=4962.084..5475.636 rows=177769 loops=1)
> Sort Key: ac.project_id
> -> Hash Join (cost=821.20..13193.43
> rows=178792 width=62) (actual time=174.905..4295.685 rows=177769 loops=1)
> Hash Cond: ("outer".chart_id = "inner".id)
> -> Hash Join (cost=817.66..10508.02
> rows=178791
> width=20) (actual time=173.952..2840.824 rows=177769 loops=1)
> Hash Cond: ("outer".trans_id =
> "inner".id)
> -> Seq Scan on acc_trans ac
> (cost=0.00..3304.38 rows=181538 width=12) (actual time=0.062..537.753
> rows=181322 loops=1)
> -> Hash (cost=659.55..659.55
> rows=22844 width=8) (actual time=173.625..173.625 rows=0 loops=1)
> -> Seq Scan on ar a
> (cost=0.00..659.55 rows=22844 width=8) (actual time=0.022..101.828
> rows=22844 loops=1)
> Filter: (customer_id
> = 11373)
> -> Hash (cost=3.23..3.23 rows=123
> width=50) (actual time=0.915..0.915 rows=0 loops=1)
> -> Seq Scan on chart c
> (cost=0.00..3.23 rows=123 width=50) (actual time=0.013..0.528 rows=123
> loops=1)
> -> Hash (cost=716.67..716.67 rows=1 width=4) (actual
> time=129.037..129.037 rows=0 loops=1)
> -> Subquery Scan "IN_subquery" (cost=716.66..716.67 rows=1
> width=4) (actual time=129.017..129.025 rows=1 loops=1)
> -> Aggregate (cost=716.66..716.66 rows=1 width=4)
> (actual time=129.008..129.011 rows=1 loops=1)
> -> Seq Scan on ar (cost=0.00..659.55 rows=22844
> width=4) (actual time=0.020..73.266 rows=22844 loops=1)
> Filter: (customer_id = 11373)
> Total runtime: 9954.133 ms
> (28 rows)
>
> The shorter query's plan is:
>
>
> QUERY PLAN
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> Hash Join (cost=728.42..732.96 rows=8 width=50) (actual
> time=130.908..131.593 rows=10 loops=1)
> Hash Cond: ("outer".id = "inner".chart_id)
> -> Seq Scan on chart c (cost=0.00..3.23 rows=123 width=50) (actual
> time=0.006..0.361 rows=123 loops=1)
> -> Hash (cost=728.40..728.40 rows=8 width=8) (actual
> time=130.841..130.841 rows=0 loops=1)
> -> Nested Loop (cost=716.67..728.40 rows=8 width=8) (actual
> time=130.692..130.805 rows=10 loops=1)
> -> Nested Loop (cost=716.67..720.89 rows=1 width=8)
> (actual time=130.626..130.639 rows=1 loops=1)
> -> HashAggregate (cost=716.67..716.67 rows=1
> width=4) (actual time=130.484..130.487 rows=1 loops=1)
> -> Subquery Scan "IN_subquery"
> (cost=716.66..716.67 rows=1 width=4) (actual time=130.455..130.464
> rows=1 loops=1)
> -> Aggregate (cost=716.66..716.66
> rows=1 width=4) (actual time=130.445..130.448 rows=1 loops=1)
> -> Seq Scan on ar
> (cost=0.00..659.55 rows=22844 width=4) (actual time=0.020..74.174
> rows=22844 loops=1)
> Filter: (customer_id = 11373)
> -> Index Scan using ar_id_key on ar a
> (cost=0.00..4.20 rows=1 width=4) (actual time=0.122..0.125 rows=1 loops=1)
> Index Cond: (a.id <http://a.id> = "outer".max)
> Filter: (customer_id = 11373)
> -> Index Scan using acc_trans_trans_id_key on acc_trans
> ac (cost=0.00..7.41 rows=8 width=12) (actual time=0.051..0.097 rows=10
> loops=1)
> Index Cond: ("outer".max = ac.trans_id)
> Total runtime: 131.879 ms
> (17 rows)
>
> I am not sure if I want to remove support for the other two tables
> yet. However, I wanted to submit this here as a (possibly corner-)
> case where the plan seems to be far slower than it needs to be.
>
> Best Wishes,
> Chris Travers
> Metatron Technology Consulting
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
with regards,
S.Gnanavel
Satyam Computer Services Ltd.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2005-07-28 11:49:29 Re: wal_buffer tests in
Previous Message Gavin Sherry 2005-07-28 09:34:50 Re: Finding bottleneck