Re: raising the default default_statistics_target

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: raising the default default_statistics_target
Date: 2004-03-08 17:17:52
Message-ID: 200403080917.52738.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Neil,

> In the simple test I performed, raising the default_statistics_target
> from 10 to 25 resulted in a 40% increase in the time to ANALYZE a
> large table. (I picked 25 more or less at random -- would 15 or 20 be
> better?)

I find that very interesting, since I haven't found much higher increases to
be a proportionate penality. For example, on an 11-column table raising 3
columns to statistics=250 merely doubled the ANALYZE time. I have not done
exact timing, but would be happy to ....

>It may also be the case that for those people for whom 10
> is an insufficient stats target, 25 is also insufficient.

It is. I've found that "problem" queries, especially those caused by real,
uneven distribution of data, require raising statistics to 150-400 in order
to fix. This is much to high a level to assign as a default.

> Any comments on whether increasing the default stats target is a good
> idea for 7.5? (Details on the test I performed are included below)

No. I don't think it's a good idea to raise the default for *all* columns;
for one thing, I'd really hate to think what, say, a default stats of 100
would do to a TEXT column with an average of 8K of data per row.

Further, in 7.5 we'll be introducing correlated stats for multi-column indexes
(unless something's gone off with that?) which should help a lot of problem
queries. And change our whole emphasis on brute forcing analyze through
increasing stats into the 100's.

If you really want to tackle this issue, though, here's what I suggest:

1) add a GUC called default_statistics_indexed, which starts at say 100 or 50.
2) When ever the user indexes a column, automatically increase the stats
to the level in default_statistics_indexed, if they are at the level in
default_statistics_target.

This will then give indexed columns "automatically" a somewhat higher level of
stats analysis than other columns. This should help a lot of "slow query"
problems, yet effectively leave the selection of "important" columns in the
hands of the DBA. Make sense?

Also, another great feature in this department would be to extend the
multi-column correlation statistics to cover foriegn keys, as a way of
improving cross-table estimates.

Anyway, keep me in the loop on this, I have a lot of very complex databases I
can test such issues on.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rod Taylor 2004-03-08 17:18:22 Re: question about selecting across multiple dbs
Previous Message Flavio 2004-03-08 17:02:46 Data from Binary Cursor