Skip site navigation (1) Skip section navigation (2)

Seq scan on join table despite index and high statistics

From: Henrik Zagerholm <henke(at)mac(dot)se>
To: pgsql-performance(at)postgresql(dot)org
Subject: Seq scan on join table despite index and high statistics
Date: 2007-07-31 08:23:38
Message-ID: CCDAD82F-2714-496B-8D4E-63A39E8D2C76@mac.se (view raw or flat)
Thread:
Lists: pgsql-performance
Hello list,

I have a problem with a simple count query on a pgsql 8.2.3 server.

SELECT COUNT(pk_file_structure_id) FROM tbl_file_structure INNER JOIN  
tbl_file ON fk_file_id = pk_file_id WHERE lower(file_name) like lower 
('awstats%');
Using Explain analyze I've noticed that it makes a seq scan on  
tbl_file_structure but I have an index on fk_file_id and its  
statistics is set to 200. I  ran an analyze on both tbl_file and  
tbl_file_structure.
The count retrieved is 75 000 so its way lower than the total 3 834  
059 rows.

Should I raise the statistics more? Is there a rule of thumb how much  
the statistics should be reagards to the number of rows in the table?
Can I make my database adjust the statistics dynamically? I don't  
want to go around to my customers changing statistics every time the  
tables starts to fill up.

Anyway here is the explain analyze on the slow query.

EXPLAIN ANALYZE SELECT COUNT(pk_file_structure_id) FROM  
tbl_file_structure INNER JOIN tbl_file ON fk_file_id = pk_file_id  
WHERE lower(file_name) like lower('awstats%');

"Aggregate  (cost=172512.17..172512.18 rows=1 width=8) (actual  
time=30316.316..30316.317 rows=1 loops=1)"
"  ->  Hash Join  (cost=12673.69..171634.39 rows=351110 width=8)  
(actual time=1927.730..30191.260 rows=75262 loops=1)"
"        Hash Cond: (tbl_file_structure.fk_file_id =  
tbl_file.pk_file_id)"
"        ->  Seq Scan on tbl_file_structure  (cost=0.00..80537.59  
rows=3834059 width=16) (actual time=10.056..14419.662 rows=3834059  
loops=1)"
"        ->  Hash  (cost=11999.34..11999.34 rows=39868 width=8)  
(actual time=1896.859..1896.859 rows=39959 loops=1)"
"              ->  Bitmap Heap Scan on tbl_file   
(cost=1157.12..11999.34 rows=39868 width=8) (actual  
time=457.867..1779.792 rows=39959 loops=1)"
"                    Filter: (lower((file_name)::text) ~~ 'awstats 
%'::text)"
"                    ->  Bitmap Index Scan on tbl_file_idx   
(cost=0.00..1147.15 rows=35881 width=0) (actual time=450.469..450.469  
rows=39959 loops=1)"
"                          Index Cond: ((lower((file_name)::text)  
~>=~ 'awstats'::character varying) AND (lower((file_name)::text) ~<~  
'awstatt'::character varying))"
"Total runtime: 30316.739 ms"

Could this have something to do with low settings in postgresql.conf?
I haven't tweaked any settings in postgresql.conf yet.

Please help,
Regards, henke 

pgsql-performance by date

Next:From: Andreas TilleDate: 2007-07-31 14:10:03
Subject: Using EXECUTE in a function
Previous:From: Tom LaneDate: 2007-07-31 05:19:16
Subject: Re: disk filling up

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group