Slow inner join, but left join is fast

From: "Jeremy Haile" <jhaile(at)fastmail(dot)fm>
To: pgsql-performance(at)postgresql(dot)org
Subject: Slow inner join, but left join is fast
Date: 2007-01-10 16:17:22
Message-ID: 1168445842.5566.1168570133@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a query made by joining two subqueries where the outer query
performing the join takes significantly longer to run than the two
subqueries.

The first subquery runs in 600ms. The seconds subquery runs in 700ms.
But the outer query takes 240 seconds to run! Both of the two
subqueries only return 8728 rows.

Changing the inner join to a left join makes the outer query run in
about 1000ms (which is great), but I don't understand why the inner join
is so slow!

I'm using PostgreSQL 8.2.1. Any ideas?

QUERY PLAN (Inner Join) - takes 240 seconds
-------------------
Nested Loop (cost=17.46..17.56 rows=1 width=120)
Join Filter: ((a.merchant_dim_id = b.merchant_dim_id) AND
(a.dcms_dim_id = b.dcms_dim_id))
-> HashAggregate (cost=8.71..8.74 rows=1 width=16)
-> Index Scan using transaction_facts_transaction_date_idx on
transaction_facts (cost=0.00..8.69 rows=1 width=16)
Index Cond: ((transaction_date >= '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date < '2007-01-09 09:30:00'::timestamp
without time zone))
-> HashAggregate (cost=8.75..8.78 rows=1 width=16)
-> HashAggregate (cost=8.71..8.72 rows=1 width=55)
-> Index Scan using
transaction_facts_transaction_date_idx on
transaction_facts (cost=0.00..8.69 rows=1 width=55)
Index Cond: ((transaction_date >= '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date < '2007-01-09 09:30:00'::timestamp
without time zone))

QUERY PLAN (Left Join) - takes one second
-------------------
Merge Left Join (cost=304037.63..304064.11 rows=2509 width=120)
Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
b.merchant_dim_id))
-> Sort (cost=152019.45..152025.72 rows=2509 width=64)
Sort Key: a.dcms_dim_id, a.merchant_dim_id
-> HashAggregate (cost=151771.15..151852.69 rows=2509
width=16)
-> Bitmap Heap Scan on transaction_facts
(cost=5015.12..150419.90 rows=77214 width=16)
Recheck Cond: ((transaction_date >= '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date < '2007-01-09 09:30:00'::timestamp
without time zone))
-> Bitmap Index Scan on
transaction_facts_transaction_date_idx
(cost=0.00..4995.81 rows=77214 width=0)
Index Cond: ((transaction_date >= '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date < '2007-01-09
09:30:00'::timestamp without time zone))
-> Sort (cost=152018.18..152020.54 rows=943 width=64)
Sort Key: b.dcms_dim_id, b.merchant_dim_id
-> Subquery Scan b (cost=151931.51..151971.59 rows=943
width=64)
-> HashAggregate (cost=151931.51..151962.16 rows=943
width=16)
-> HashAggregate (cost=151578.11..151672.35
rows=9424 width=55)
-> Bitmap Heap Scan on transaction_facts
(cost=5015.12..150419.90 rows=77214 width=55)
Recheck Cond: ((transaction_date >=
'2007-01-09 00:00:00'::timestamp without
time zone) AND (transaction_date <
'2007-01-09 09:30:00'::timestamp without
time zone))
-> Bitmap Index Scan on
transaction_facts_transaction_date_idx
(cost=0.00..4995.81 rows=77214 width=0)
Index Cond: ((transaction_date >=
'2007-01-09 00:00:00'::timestamp
without time zone) AND
(transaction_date < '2007-01-09
09:30:00'::timestamp without time
zone))

QUERY
-------------------
select a.merchant_dim_id, a.dcms_dim_id,
a.num_success, a.num_failed, a.total_transactions,
a.success_rate,
b.distinct_num_success, b.distinct_num_failed,
b.distinct_total_transactions, b.distinct_success_rate
from (

-- SUBQUERY 1
select merchant_dim_id,
dcms_dim_id,
sum(success) as num_success,
sum(failed) as num_failed,
count(*) as total_transactions,
(sum(success) * 1.0 / count(*)) as success_rate
from transaction_facts
where transaction_date >= '2007-1-9'
and transaction_date < '2007-1-9 9:30'
group by merchant_dim_id, dcms_dim_id

) as a inner join (

-- SUBQUERY 2
select merchant_dim_id,
dcms_dim_id,
sum(success) as distinct_num_success,
sum(failed) as distinct_num_failed,
count(*) as distinct_total_transactions,
(sum(success) * 1.0 / count(*)) as distinct_success_rate
from (

select merchant_dim_id,
dcms_dim_id,
serial_number,
success,
failed
from transaction_facts
where transaction_date >= '2007-1-9'
and transaction_date < '2007-1-9 9:30'
group by merchant_dim_id, dcms_dim_id, serial_number, success, failed

) as distinct_summary
group by merchant_dim_id, dcms_dim_id

) as b using(merchant_dim_id, dcms_dim_id)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message tsuraan 2007-01-10 17:14:13 Re: group by will not use an index?
Previous Message Tom Lane 2007-01-10 14:53:42 Re: Horribly slow query/ sequential scan