Skip site navigation (1) Skip section navigation (2)

Re: When to bump up statistics?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-performance(at)postgresql(dot)org,Dawid Kuroczko <qnex42(at)gmail(dot)com>
Subject: Re: When to bump up statistics?
Date: 2004-11-19 20:32:59
Message-ID: 200411191232.59600.josh@agliodbs.com (view raw or flat)
Thread:
Lists: pgsql-performance
Dawid,

> I wonder what are the implications of using this statement,
> I know by using, say n=100, ANALYZE will take more time,
> pg_statistics will be bigger, planner will take longer time,
> on the other hand it will make better decisions... Etc, etc.

Yep.   And pg_statistics will need to be vacuumed more often.

> Is bumping up statistics is only useful for indexed columns?

No.   It's potentially useful for any queried column.

> 1) huge table with huge number of distinct values (_almost_
>     unique ;))

Yes.

> 2) huge table with relatively equally distributed values
>     (like each value is in between, say, 30-50 rows).

Not usually.

> 3) huge table with unequally distributed values (some
>     values are in 1-5 rows, some are in 1000-5000 rows).

Yes.

> 4) huge table with small number values (around ~100
>     distinct values, equally or uneqally distributed).

Not usually, especially if they are equally distributed.

> 5) boolean column.

Almost never, just as it is seldom useful to index a boolean column.

> I think SET STATISTICS 100 is very useful for case with
> unequally distributed values, but I wonder what about
> the other cases.  And as a side note -- what are the
> reasonable bounds for statistics (between 10 and 100?)

Oh, no, I've used values up to 500 in production, and we've tested up to the 
max on DBT-3.    In my experience, if the default (10) isn't sufficient, you 
often have to go up to > 250 to get a different plan.

> What are the runtime implications of setting statistics
> too large -- how much can it affect queries?

It won't affect select queries.   It will affect ANALYZE time (substantially 
in the aggregate) and maintenance on the pg_statistics table.

> And finally -- how other RDBMS and RDBM-likes deal
> with this issue? :)

Most don't allow such fine-tuned adjustment.   MSSQL, for example, allows only 
setting it per-table or maybe even database-wide, and on that platform it 
doesn't seem to have much effect on query plans.    Oracle prefers to use 
HINTS, which are a brute-force method to manage query plans.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

In response to

pgsql-performance by date

Next:From: Arshavir GrigorianDate: 2004-11-19 20:39:51
Subject: Re: index use
Previous:From: Stephan SzaboDate: 2004-11-19 20:29:42
Subject: Re: index use

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group