Wrong estimation of rows for hash join

From: Christian Schröder <cs(at)deriva(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Wrong estimation of rows for hash join
Date: 2009-10-16 08:59:26
Message-ID: 4AD835EE.7060504@deriva.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi list,
I have the following query:
SELECT *
FROM base
INNER JOIN pt USING (x) WHERE pt.y IN ('1121', '11411', '11421',
'1161', '1162');

"explain analyze" yields the following result:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=529.18..164930.70 rows=28374 width=0) (actual
time=10.834..4877.326 rows=245298 loops=1)
Hash Cond: (b.x = pt.x)
-> Seq Scan on b (cost=0.00..159579.93 rows=1210093 width=4)
(actual time=0.018..2464.871 rows=1210250 loops=1)
-> Hash (cost=527.41..527.41 rows=142 width=4) (actual
time=10.775..10.775 rows=138 loops=1)
-> Seq Scan on pt (cost=0.00..527.41 rows=142 width=4)
(actual time=0.057..10.556 rows=138 loops=1)
Filter: (y = ANY ('{1121,11411,11421,1161,1162}'::bpchar[]))
Total runtime: 5170.837 ms

As you can see, the estimated number of rows in the join is much lower
than the actual number. Normally, the join is part of a more complex
query which gets really slow, probably (among other reasons) because the
query planner uses the incorrect row estimate for its further plans.

Question is: What can I do to improve that estimation? There is a
foreign key from base.x to pt.x and both fields are declared not null.

Side question: What can I do to improve the speed of the hash join
itself? I understand that 2.5 secs are needed to perform the sequential
scan of table b, but where do the remaining 2.5 secs come from?

Thanks a lot in advance,
Christian

--
Deriva GmbH Tel.: +49 551 489500-42
Financial IT and Consulting Fax: +49 551 489500-91
Hans-Böckler-Straße 2 http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Arnaud Lesauvage 2009-10-16 09:36:11 pgsql2shp : Encoding headache
Previous Message Viktor Rosenfeld 2009-10-16 07:23:42 Re: Can't find documentation for ~=~ operator