Hash Join performance

From: Vamsidhar Thummala <vamsi(at)cs(dot)duke(dot)edu>
To: pgsql-performance(at)postgresql(dot)org
Subject: Hash Join performance
Date: 2009-03-13 21:15:07
Message-ID: e0e3da5e0903131415k38dc870bo15c6ad2625cd922b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

From the documentation, I understand that range of actual time represents
the time taken for retrieving the first result and the last result
respectively. However, the following output of explain analyze confuses me:

GroupAggregate (cost=632185.58..632525.55 rows=122884 width=57)
(actual time=187382.499..187383.241 rows=57 loops=1)
-> Sort (cost=632185.58..632201.78 rows=122884 width=57) (actual
time=187167.792..187167.905 rows=399 loops=1)

Sort Key: orders.o_totalprice, orders.o_orderdate,
customer.c_name, customer.c_custkey, orders.o_orderkey
-> Hash Join (cost=399316.78..629388.21 rows=122884
width=57) (actual time=122805.133..186107.210 rows=399 loops=1)

Hash Cond: (public.lineitem.l_orderkey = orders.o_orderkey)
-> Seq Scan on lineitem (cost=0.00..163912.71
rows=6000742 width=14) (actual time=0.022..53597.555 rows=6001215
loops=1)

-> Hash (cost=398960.15..398960.15 rows=30713
width=51) (actual time=112439.592..112439.592 rows=57 loops=1)
-> Hash Join (cost=369865.37..398960.15
rows=30713 width=51) (actual time=80638.283..111855.510 rows=57
loops=1)

Hash Cond: (orders.o_custkey = customer.c_custkey)
-> Nested Loop (cost=364201.67..391753.70
rows=30713 width=29) (actual time=75598.246..107760.054 rows=57
loops=1)

-> GroupAggregate
(cost=364201.67..366634.97 rows=30713 width=14) (actual
time=75427.115..96167.167 rows=57 loops=1)
Filter: (sum(l_quantity) > 300::numeric)

-> Sort
(cost=364201.67..364992.54 rows=6000742 width=14) (actual
time=74987.112..86289.063 rows=6001215 loops=1)
Sort Key:
public.lineitem.l_orderkey

-> Seq Scan on lineitem
(cost=0.00..163912.71 rows=6000742 width=14) (actual
time=0.006..51153.880 rows=6001215 loops=1)
-> Index Scan using orders_pkey on
orders (cost=0.00..0.81 rows=1 width=25) (actual
time=169.485..173.006 rows=1 loops=57)

Index Cond: (orders.o_orderkey
= "IN_subquery".l_orderkey)
-> Hash (cost=4360.96..4360.96
rows=150072 width=26) (actual time=998.378..998.378 rows=150000
loops=1)

-> Seq Scan on customer
(cost=0.00..4360.96 rows=150072 width=26) (actual time=8.188..883.778
rows=150000 loops=1)
Total runtime: 187644.927 ms
(20 rows)

My settings: Memory - 1GB, Data size - 1GB, Lineitem ~ 650MB,
shared_buffers: 200MB, work_mem: 1MB.

PostgreSQL version: 8.2, OS: Sun Solaris 10u4

Query: TPC-H 18, Large Volume Customer Query

Questions:

1) The actual time on Seq Scan on Lineitem shows that the first record is
fetched at time 0.022ms and the last record is fetched at 53.5s. Does it
mean the sequential scan is completed with-in first 53.4s (absolute time)?
Or does it mean that sequential scan started at 112.43s (after build phase
of Hash Join) and finished at 165.93s (112.43 + 53.5s)? My understanding is
that former is true. If so, the sequential scan has to fetched around 6M
records (~650MB) ahead of build phase of Hash Join, which seems surprising.
Is this called prefetching at DB level? Where does the DB hold all these
records? Definitely, it can't hold in shared_buffers since it's only 200MB.

2) Why is the Hash Join (top most) so slow? The hash is build over the
output of subplan which produces 57 records (~20kb). We can assume that
these 57 records fit into work_mem. Now, the Hash Join is producing first
record at 122.8s where as the Hash build is completed at 112.4s (10.4s
difference. I have seen in some cases, this gap is even worse). Also, the
total time for Hash Join is 63.3s, which seems too high given that Lineitem
is already in the buffer. What is happening over here?

Appreciate your help!

Regards,
~Vamsi

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-03-13 21:34:38 Re: Hash Join performance
Previous Message Jignesh K. Shah 2009-03-13 20:02:22 Re: Proposal of tunable fix for scalability of 8.4