Skip site navigation (1) Skip section navigation (2)

Slow inner join, but left join is fast

From: "Jeremy Haile" <jhaile(at)fastmail(dot)fm>
To: pgsql-performance(at)postgresql(dot)org
Subject: Slow inner join, but left join is fast
Date: 2007-01-10 16:17:22
Message-ID: 1168445842.5566.1168570133@webmail.messagingengine.com (view raw or flat)
Thread:
Lists: pgsql-performance
I have a query made by joining two subqueries where the outer query
performing the join takes significantly longer to run than the two
subqueries.  

The first subquery runs in 600ms.  The seconds subquery runs in 700ms. 
But the outer query takes 240 seconds to run!  Both of the two
subqueries only return 8728 rows.  

Changing the inner join to a left join makes the outer query run in
about 1000ms (which is great), but I don't understand why the inner join
is so slow!

I'm using PostgreSQL 8.2.1.  Any ideas?

QUERY PLAN (Inner Join) - takes 240 seconds
-------------------
Nested Loop  (cost=17.46..17.56 rows=1 width=120)
  Join Filter: ((a.merchant_dim_id = b.merchant_dim_id) AND
  (a.dcms_dim_id = b.dcms_dim_id))
  ->  HashAggregate  (cost=8.71..8.74 rows=1 width=16)
        ->  Index Scan using transaction_facts_transaction_date_idx on
        transaction_facts  (cost=0.00..8.69 rows=1 width=16)
              Index Cond: ((transaction_date >= '2007-01-09
              00:00:00'::timestamp without time zone) AND
              (transaction_date < '2007-01-09 09:30:00'::timestamp
              without time zone))
  ->  HashAggregate  (cost=8.75..8.78 rows=1 width=16)
        ->  HashAggregate  (cost=8.71..8.72 rows=1 width=55)
              ->  Index Scan using
              transaction_facts_transaction_date_idx on
              transaction_facts  (cost=0.00..8.69 rows=1 width=55)
                    Index Cond: ((transaction_date >= '2007-01-09
                    00:00:00'::timestamp without time zone) AND
                    (transaction_date < '2007-01-09 09:30:00'::timestamp
                    without time zone))


QUERY PLAN (Left Join) - takes one second
-------------------
Merge Left Join  (cost=304037.63..304064.11 rows=2509 width=120)
  Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
  b.merchant_dim_id))
  ->  Sort  (cost=152019.45..152025.72 rows=2509 width=64)
        Sort Key: a.dcms_dim_id, a.merchant_dim_id
        ->  HashAggregate  (cost=151771.15..151852.69 rows=2509
        width=16)
              ->  Bitmap Heap Scan on transaction_facts 
              (cost=5015.12..150419.90 rows=77214 width=16)
                    Recheck Cond: ((transaction_date >= '2007-01-09
                    00:00:00'::timestamp without time zone) AND
                    (transaction_date < '2007-01-09 09:30:00'::timestamp
                    without time zone))
                    ->  Bitmap Index Scan on
                    transaction_facts_transaction_date_idx 
                    (cost=0.00..4995.81 rows=77214 width=0)
                          Index Cond: ((transaction_date >= '2007-01-09
                          00:00:00'::timestamp without time zone) AND
                          (transaction_date < '2007-01-09
                          09:30:00'::timestamp without time zone))
  ->  Sort  (cost=152018.18..152020.54 rows=943 width=64)
        Sort Key: b.dcms_dim_id, b.merchant_dim_id
        ->  Subquery Scan b  (cost=151931.51..151971.59 rows=943
        width=64)
              ->  HashAggregate  (cost=151931.51..151962.16 rows=943
              width=16)
                    ->  HashAggregate  (cost=151578.11..151672.35
                    rows=9424 width=55)
                          ->  Bitmap Heap Scan on transaction_facts 
                          (cost=5015.12..150419.90 rows=77214 width=55)
                                Recheck Cond: ((transaction_date >=
                                '2007-01-09 00:00:00'::timestamp without
                                time zone) AND (transaction_date <
                                '2007-01-09 09:30:00'::timestamp without
                                time zone))
                                ->  Bitmap Index Scan on
                                transaction_facts_transaction_date_idx 
                                (cost=0.00..4995.81 rows=77214 width=0)
                                      Index Cond: ((transaction_date >=
                                      '2007-01-09 00:00:00'::timestamp
                                      without time zone) AND
                                      (transaction_date < '2007-01-09
                                      09:30:00'::timestamp without time
                                      zone))


QUERY
-------------------
select a.merchant_dim_id, a.dcms_dim_id, 
       a.num_success, a.num_failed, a.total_transactions,
       a.success_rate,
       b.distinct_num_success, b.distinct_num_failed,
       b.distinct_total_transactions, b.distinct_success_rate
from (

-- SUBQUERY 1
select merchant_dim_id, 
       dcms_dim_id,
       sum(success) as num_success, 
       sum(failed) as num_failed, 
       count(*) as total_transactions,
       (sum(success) * 1.0 / count(*)) as success_rate 
from transaction_facts 
where transaction_date >= '2007-1-9' 
and transaction_date < '2007-1-9 9:30' 
group by merchant_dim_id, dcms_dim_id

) as a inner join (

-- SUBQUERY 2
select merchant_dim_id, 
       dcms_dim_id,
       sum(success) as distinct_num_success, 
       sum(failed) as distinct_num_failed, 
       count(*) as distinct_total_transactions, 
       (sum(success) * 1.0 / count(*)) as distinct_success_rate 
from (

  select merchant_dim_id, 
         dcms_dim_id,
         serial_number,
         success,
         failed       
  from transaction_facts 
  where transaction_date >= '2007-1-9' 
  and transaction_date < '2007-1-9 9:30' 
  group by merchant_dim_id, dcms_dim_id, serial_number, success, failed

  ) as distinct_summary
group by merchant_dim_id, dcms_dim_id

) as b using(merchant_dim_id, dcms_dim_id)

Responses

pgsql-performance by date

Next:From: tsuraanDate: 2007-01-10 17:14:13
Subject: Re: group by will not use an index?
Previous:From: Tom LaneDate: 2007-01-10 14:53:42
Subject: Re: Horribly slow query/ sequential scan

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group