Seq scan on join table despite index and high statistics

From: Henrik Zagerholm <henke(at)mac(dot)se>
To: pgsql-performance(at)postgresql(dot)org
Subject: Seq scan on join table despite index and high statistics
Date: 2007-07-31 08:23:38
Message-ID: CCDAD82F-2714-496B-8D4E-63A39E8D2C76@mac.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello list,

I have a problem with a simple count query on a pgsql 8.2.3 server.

SELECT COUNT(pk_file_structure_id) FROM tbl_file_structure INNER JOIN
tbl_file ON fk_file_id = pk_file_id WHERE lower(file_name) like lower
('awstats%');
Using Explain analyze I've noticed that it makes a seq scan on
tbl_file_structure but I have an index on fk_file_id and its
statistics is set to 200. I ran an analyze on both tbl_file and
tbl_file_structure.
The count retrieved is 75 000 so its way lower than the total 3 834
059 rows.

Should I raise the statistics more? Is there a rule of thumb how much
the statistics should be reagards to the number of rows in the table?
Can I make my database adjust the statistics dynamically? I don't
want to go around to my customers changing statistics every time the
tables starts to fill up.

Anyway here is the explain analyze on the slow query.

EXPLAIN ANALYZE SELECT COUNT(pk_file_structure_id) FROM
tbl_file_structure INNER JOIN tbl_file ON fk_file_id = pk_file_id
WHERE lower(file_name) like lower('awstats%');

"Aggregate (cost=172512.17..172512.18 rows=1 width=8) (actual
time=30316.316..30316.317 rows=1 loops=1)"
" -> Hash Join (cost=12673.69..171634.39 rows=351110 width=8)
(actual time=1927.730..30191.260 rows=75262 loops=1)"
" Hash Cond: (tbl_file_structure.fk_file_id =
tbl_file.pk_file_id)"
" -> Seq Scan on tbl_file_structure (cost=0.00..80537.59
rows=3834059 width=16) (actual time=10.056..14419.662 rows=3834059
loops=1)"
" -> Hash (cost=11999.34..11999.34 rows=39868 width=8)
(actual time=1896.859..1896.859 rows=39959 loops=1)"
" -> Bitmap Heap Scan on tbl_file
(cost=1157.12..11999.34 rows=39868 width=8) (actual
time=457.867..1779.792 rows=39959 loops=1)"
" Filter: (lower((file_name)::text) ~~ 'awstats
%'::text)"
" -> Bitmap Index Scan on tbl_file_idx
(cost=0.00..1147.15 rows=35881 width=0) (actual time=450.469..450.469
rows=39959 loops=1)"
" Index Cond: ((lower((file_name)::text)
~>=~ 'awstats'::character varying) AND (lower((file_name)::text) ~<~
'awstatt'::character varying))"
"Total runtime: 30316.739 ms"

Could this have something to do with low settings in postgresql.conf?
I haven't tweaked any settings in postgresql.conf yet.

Please help,
Regards, henke

Browse pgsql-performance by date

  From Date Subject
Next Message Andreas Tille 2007-07-31 14:10:03 Using EXECUTE in a function
Previous Message Tom Lane 2007-07-31 05:19:16 Re: disk filling up