Re: Cost Issue - How do I force a Hash Join

From: "Virag Saksena" <virag(at)auptyma(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Cost Issue - How do I force a Hash Join
Date: 2006-02-21 06:33:39
Message-ID: 02a101c636b0$c08e19c0$3100000a@demo1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tables are analyzed, though I would love to find a way to increase it's
accuracy of statistics
Tried raising the statistics target upto 100, but it did not help. Should I
bump it even more

However I found that if I add depth to the group by clauses, it somehow
tells the optimizer that it would get more than 1 row
and it goes to a Hash Join ....
For this query, only rows with one value of depth are accessed, so we are
okay ... but I would like to see if there is some other
way I can get a better approximation for the costs

Sort (cost=25214.36..25214.39 rows=10 width=958) (actual
time=9798.860..9815.670 rows=6115 loops=1)
Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
->Hash Left Join (cost=25213.83..25214.19 rows=10 width=958) (actual
time=8526.248..9755.721 rows=6115 loops=1)
Hash Cond: ((("outer".objsig)::text = ("inner".objsig)::text) AND
(("outer".objtype)::text = ("inner".objtype)::text) AND
(("outer".fieldname)::text = ("inner".fieldname)::text))
->Subquery Scan o1 (cost=18993.48..18993.66 rows=10 width=990) (actual
time=6059.880..6145.223 rows=6115 loops=1)
->HashAggregate (cost=18993.48..18993.56 rows=10 width=46) (actual
time=6059.871..6094.897 rows=6115 loops=1)
->Nested Loop (cost=0.00..18993.22 rows=15 width=46) (actual
time=45.510..5980.807 rows=6338 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel ar
(cost=0.00..18932.01 rows=10 width=19) (actual time=45.374..205.520
rows=6338 loops=1)
Index Cond: ((heap_id = 1) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj ao
(cost=0.00..6.10 rows=2 width=43) (actual time=0.885..0.890 rows=1
loops=6338)
Index Cond: ((ao.heap_id = 1) AND (ao.objaddr =
"outer".childaddr))
->Hash (cost=6220.34..6220.34 rows=2 width=982) (actual
time=2466.178..2466.178 rows=0 loops=1)
->Subquery Scan o2 (cost=6220.30..6220.34 rows=2 width=982) (actual
time=2225.242..2433.744 rows=6038 loops=1)
->HashAggregate (cost=6220.30..6220.32 rows=2 width=46) (actual
time=2225.233..2366.890 rows=6038 loops=1)
->Nested Loop (cost=0.00..6220.27 rows=2 width=46) (actual
time=0.449..2149.257 rows=6259 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel br
(cost=0.00..6202.89 rows=4 width=19) (actual time=0.296..51.310 rows=6259
loops=1)
Index Cond: ((heap_id = 0) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj bo
(cost=0.00..4.33 rows=1 width=43) (actual time=0.294..0.300 rows=1
loops=6259)
Index Cond: ((bo.heap_id = 0) AND (bo.objaddr =
"outer".childaddr))
Total runtime: 9950.192 ms

Regards,

Virag

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Virag Saksena" <virag(at)auptyma(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Sent: Monday, February 20, 2006 9:35 PM
Subject: Re: [PERFORM] Cost Issue - How do I force a Hash Join

> "Virag Saksena" <virag(at)auptyma(dot)com> writes:
> > The individual queries run in 50-300 ms. However the optimizer is =
> > choosing a nested loop to join them rather than a Hash join
> > causing the complete query to take 500+ seconds. It expects that it will
=
> > get 1 row out from each of the sources, but here is gets
> > several thousand rows.
>
> The best approach is to see if you can't fix that estimation error.
> Are the stats up to date on these tables? If so, maybe raising the
> statistics targets would help.
>
> regards, tom lane
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ibrahim Tekin 2006-02-21 15:57:12 LIKE query on indexes
Previous Message Craig A. James 2006-02-21 05:54:14 Re: Cost Issue - How do I force a Hash Join