Re: Strange statistics

From: Henrik <henke(at)mac(dot)se>
To: Joris Dobbelsteen <joris(at)familiedobbelsteen(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Strange statistics
Date: 2008-06-05 07:32:42
Message-ID: 7A5A80E8-8F03-40C2-B2B8-9E70AE89DE04@mac.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


3 jun 2008 kl. 23.31 skrev Joris Dobbelsteen:

> Henrik wrote:
>> Hi list,
>> I'm having a table with a lots of file names in it. (Aprox 3
>> million) in a 8.3.1 db.
>> Doing this simple query shows that the statistics is way of but I
>> can get them right even when I raise the statistics to 1000.
>> db=# alter table tbl_file alter file_name set statistics 1000;
>> ALTER TABLE
>> db=# analyze tbl_file;
>> ANALYZE
>> db=# explain analyze select * from tbl_file where lower(file_name)
>> like lower('to%');
>> QUERY PLAN
>> ---------------------------------------------------------------------------------------------------------------------------- Bitmap
>> Heap Scan on tbl_file (cost=23.18..2325.13 rows=625 width=134)
>> (actual time=7.938..82.386 rows=17553 loops=1)
>> Filter: (lower((file_name)::text) ~~ 'to%'::text)
>> -> Bitmap Index Scan on tbl_file_idx (cost=0.00..23.02 rows=625
>> width=0) (actual time=6.408..6.408 rows=17553 loops=1)
>> Index Cond: ((lower((file_name)::text) ~>=~ 'to'::text) AND
>> (lower((file_name)::text) ~<~ 'tp'::text))
>> Total runtime: 86.230 ms
>> (5 rows)
>> How can it be off by a magnitude of 28??
>
> These are statistics and represent an only estimate! In this case,
> the planner seems to be doing the right thing(tm) anyway.
>
> Statistics is a frequently misunderstood subject and usually
> provides excellent material to draw plain wrong conclusions. There
> is a good chance that due to the physical layout of your data, the
> algorithms in the statistics collector, the existence of uncertainty
> and some more unknown factors your statistics will be biased. This
> is a situations where you noticed it.
>
> Running "SELECT * FROM pg_stats;" will give you the statistics the
> planner uses and can provide some hints to why the planner has
> chosen these estimates.
> Probably statistics will vary between ANALYZE runs. Its also
> possible to try "CLUSTER" and friends. Try different queries and
> look at the deviations.
Thanks Joris for your input. You are the second person that suggests
CLUSTER for me. Maybe I should take a look. The problem is that our
select queries are kinda random. Would CLUSTER help then also? Should
I just CLUSTER on the moste used index or?

Thanks
/henke

>
>
> All in all, you should really start worrying when the planner starts
> planning inefficient queries. Since its a filename, it might be
> highly irregular (random) and a low statistics target might be good
> enough anyways.
>
> Unfortunately I'm not a statistics expert...
>
> - Joris

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Henrik 2008-06-05 07:35:58 Re: Database growing. Need autovacuum help.
Previous Message Richard Broersma 2008-06-05 06:39:00 Re: Tripping up on my first attempt at building PG from source