Re: Simple postgresql.conf wizard

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dann Corbit" <DCorbit(at)connx(dot)com>, "Decibel!" <decibel(at)decibel(dot)org>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Mark Wong" <markwkm(at)gmail(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Simple postgresql.conf wizard
Date: 2008-11-26 13:47:29
Message-ID: 87wseqppji.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Dann Corbit" <DCorbit(at)connx(dot)com> writes:
>> I also do not believe that there is any value that will be the right
>> answer. But a table of data might be useful both for people who want to
>> toy with altering the values and also for those who want to set the
>> defaults. I guess that at one time such a table was generated to
>> produce the initial estimates for default values.
>
> Sir, you credit us too much :-(. The actual story is that the current
> default of 10 was put in when we first implemented stats histograms,
> replacing code that kept track of only a *single* most common value
> (and not very well, at that). So it was already a factor of 10 more
> stats than we had experience with keeping, and accordingly conservatism
> suggested not boosting the default much past that.

I think that's actually too little credit. The sample size is chosen quite
carefully based on solid mathematics to provide a specific confidence interval
estimate for queries covering ranges the size of a whole bucket.

The actual number of buckets more of an arbitrary choice. It depends entirely
on how your data is distributed and how large a range your queries are
covering. A uniformly distributed data set should only need a single bucket to
generate good estimates. Less evenly distributed data sets need more.

I wonder actually if there are algorithms for estimating the number of buckets
needed for a histogram to achieve some measurable goal. That would close the
loop. It would be much more reassuring to base the size of the sample on solid
statistics than on hunches.

> So we really don't have any methodically-gathered evidence about the
> effects of different stats settings. It wouldn't take a lot to convince
> us to switch to a different default, I think, but it would be nice to
> have more than none.

I think the difficulty (aside from testing being laborious at the best of
times) is that it's heavily dependent on data sets which are hard to generate
good examples for. Offhand I would think the census data might make a good
starting point -- it should have columns which range from perfectly uniform to
highly skewed.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-11-26 13:58:30 Re: Visibility map, partial vacuums
Previous Message Tom Lane 2008-11-26 13:40:25 Re: Simple postgresql.conf wizard