Cost Issue - How do I force a Hash Join

From: "Virag Saksena" <virag(at)auptyma(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Cost Issue - How do I force a Hash Join
Date: 2006-02-21 05:13:34
Message-ID: 028401c636a5$907fb4b0$3100000a@demo1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,
I have query where I do two inline queries (which involves grouping) and then join them with an outer join.
The individual queries run in 50-300 ms. However the optimizer is choosing a nested loop to join them rather than a Hash join
causing the complete query to take 500+ seconds. It expects that it will get 1 row out from each of the sources, but here is gets
several thousand rows.

Is there any way I can get a hash join used on the outer join, while preserving the nested loops.

explain analyze
select o1.objaddr, o1.fieldname, o1.objsig,
o1.totmem, o1.cnt,
o2.totmem, o2.cnt
from
( select min(ao.objaddr) as objaddr, count(*) as cnt,
sum(ao.totmem) as totmem, ao.objsig, ar.fieldname, ao.objtype
from jam_heapobj ao, jam_heaprel ar
where ar.heap_id = 1 and ar.parentaddr = 0 and ar.fieldname = 'K'
and ao.heap_id = ar.heap_id and ao.objaddr = ar.childaddr
group by ao.objsig, ar.fieldname, ao.objtype) o1
left outer join
(select min(bo.objaddr) as objaddr, count(*) as cnt,
sum(bo.totmem) as totmem, bo.objsig, br.fieldname, bo.objtype
from jam_heapobj bo, jam_heaprel br
where br.heap_id = 0 and br.parentaddr = 0 and br.fieldname = 'K'
and bo.heap_id = br.heap_id and bo.objaddr = br.childaddr
group by bo.objsig, br.fieldname, bo.objtype) o2
on ( o2.objsig = o1.objsig and o2.objtype = o1.objtype
and o2.fieldname = o1.fieldname)
order by o1.totmem - coalesce(o2.totmem,0) desc;

Sort (cost=16305.41..16305.42 rows=1 width=562) (actual time=565997.769..566016.255 rows=6115 loops=1)
Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
->Nested Loop Left Join (cost=16305.22..16305.40 rows=1 width=562) (actual time=612.631..565896.047 rows=6115 loops=1)
Join Filter: ((("inner".objsig)::text = ("outer".objsig)::text) AND (("inner".objtype)::text = ("outer".objtype)::text) AND (("inner".fieldname)::text = ("outer".fieldname)::text))
->Subquery Scan o1 (cost=12318.12..12318.15 rows=1 width=514) (actual time=309.659..413.311 rows=6115 loops=1)
->HashAggregate (cost=12318.12..12318.14 rows=1 width=54) (actual time=309.649..367.206 rows=6115 loops=1)
->Nested Loop (cost=0.00..12317.90 rows=10 width=54) (actual time=0.243..264.116 rows=6338 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel ar (cost=0.00..12275.00 rows=7 width=19) (actual time=0.176..35.780 rows=6338 loops=1)
Index Cond: ((heap_id = 1) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj ao (cost=0.00..6.10 rows=2 width=51) (actual time=0.019..0.022 rows=1 loops=6338)
Index Cond: ((ao.heap_id = 1) AND (ao.objaddr = "outer".childaddr))
->Subquery Scan o2 (cost=3987.10..3987.13 rows=1 width=514) (actual time=0.062..75.171 rows=6038 loops=6115)
->HashAggregate (cost=3987.10..3987.12 rows=1 width=54) (actual time=0.056..36.469 rows=6038 loops=6115)
->Nested Loop (cost=0.00..3987.05 rows=2 width=54) (actual time=0.145..257.876 rows=6259 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel br (cost=0.00..3974.01 rows=3 width=19) (actual time=0.074..35.124 rows=6259 loops=1)
Index Cond: ((heap_id = 0) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj bo (cost=0.00..4.33 rows=1 width=51) (actual time=0.018..0.022 rows=1 loops=6259)
Index Cond: ((bo.heap_id = 0) AND (bo.objaddr = "outer".childaddr))
Total runtime: 566044.187 ms
(21 rows)

Regards,

Virag

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-02-21 05:35:55 Re: Cost Issue - How do I force a Hash Join
Previous Message Chris 2006-02-21 00:21:52 Re: Creating a correct and real benchmark