| From: | Christian Schröder <cs(at)deriva(dot)de> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Wrong estimation of rows for hash join | 
| Date: | 2009-10-16 08:59:26 | 
| Message-ID: | 4AD835EE.7060504@deriva.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Hi list,
I have the following query:
    SELECT *
    FROM base
    INNER JOIN pt USING (x) WHERE pt.y IN ('1121', '11411', '11421', 
'1161', '1162');
"explain analyze" yields the following result:
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=529.18..164930.70 rows=28374 width=0) (actual 
time=10.834..4877.326 rows=245298 loops=1)
   Hash Cond: (b.x = pt.x)
   ->  Seq Scan on b  (cost=0.00..159579.93 rows=1210093 width=4) 
(actual time=0.018..2464.871 rows=1210250 loops=1)
   ->  Hash  (cost=527.41..527.41 rows=142 width=4) (actual 
time=10.775..10.775 rows=138 loops=1)
         ->  Seq Scan on pt  (cost=0.00..527.41 rows=142 width=4) 
(actual time=0.057..10.556 rows=138 loops=1)
               Filter: (y = ANY ('{1121,11411,11421,1161,1162}'::bpchar[]))
 Total runtime: 5170.837 ms
As you can see, the estimated number of rows in the join is much lower 
than the actual number. Normally, the join is part of a more complex 
query which gets really slow, probably (among other reasons) because the 
query planner uses the incorrect row estimate for its further plans.
Question is: What can I do to improve that estimation? There is a 
foreign key from base.x to pt.x and both fields are declared not null.
Side question: What can I do to improve the speed of the hash join 
itself? I understand that 2.5 secs are needed to perform the sequential 
scan of table b, but where do the remaining 2.5 secs come from?
Thanks a lot in advance,
    Christian
-- 
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen
Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Arnaud Lesauvage | 2009-10-16 09:36:11 | pgsql2shp : Encoding headache | 
| Previous Message | Viktor Rosenfeld | 2009-10-16 07:23:42 | Re: Can't find documentation for ~=~ operator |