Re: When to bump up statistics?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-performance(at)postgresql(dot)org, Dawid Kuroczko <qnex42(at)gmail(dot)com>
Subject: Re: When to bump up statistics?
Date: 2004-11-19 20:32:59
Message-ID: 200411191232.59600.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dawid,

> I wonder what are the implications of using this statement,
> I know by using, say n=100, ANALYZE will take more time,
> pg_statistics will be bigger, planner will take longer time,
> on the other hand it will make better decisions... Etc, etc.

Yep. And pg_statistics will need to be vacuumed more often.

> Is bumping up statistics is only useful for indexed columns?

No. It's potentially useful for any queried column.

> 1) huge table with huge number of distinct values (_almost_
> unique ;))

Yes.

> 2) huge table with relatively equally distributed values
> (like each value is in between, say, 30-50 rows).

Not usually.

> 3) huge table with unequally distributed values (some
> values are in 1-5 rows, some are in 1000-5000 rows).

Yes.

> 4) huge table with small number values (around ~100
> distinct values, equally or uneqally distributed).

Not usually, especially if they are equally distributed.

> 5) boolean column.

Almost never, just as it is seldom useful to index a boolean column.

> I think SET STATISTICS 100 is very useful for case with
> unequally distributed values, but I wonder what about
> the other cases. And as a side note -- what are the
> reasonable bounds for statistics (between 10 and 100?)

Oh, no, I've used values up to 500 in production, and we've tested up to the
max on DBT-3. In my experience, if the default (10) isn't sufficient, you
often have to go up to > 250 to get a different plan.

> What are the runtime implications of setting statistics
> too large -- how much can it affect queries?

It won't affect select queries. It will affect ANALYZE time (substantially
in the aggregate) and maintenance on the pg_statistics table.

> And finally -- how other RDBMS and RDBM-likes deal
> with this issue? :)

Most don't allow such fine-tuned adjustment. MSSQL, for example, allows only
setting it per-table or maybe even database-wide, and on that platform it
doesn't seem to have much effect on query plans. Oracle prefers to use
HINTS, which are a brute-force method to manage query plans.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Arshavir Grigorian 2004-11-19 20:39:51 Re: index use
Previous Message Stephan Szabo 2004-11-19 20:29:42 Re: index use