Re: ANALYZE sampling is too good

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Greg Stark *EXTERN*" <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-10 08:28:23
Message-ID: A737B7A37273E048B164557ADEF4A58B17C7DCEF@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark wrote:
>> It's also applicable for the other stats; histogram buckets constructed
>> from a 5% sample are more likely to be accurate than those constructed
>> from a 0.1% sample. Same with nullfrac. The degree of improved
>> accuracy, would, of course, require some math to determine.
>
> This "some math" is straightforward basic statistics. The 95th
> percentile confidence interval for a sample consisting of 300 samples
> from a population of a 1 million would be 5.66%. A sample consisting
> of 1000 samples would have a 95th percentile confidence interval of
> +/- 3.1%.

Doesn't all that assume a normally distributed random variable?

I don't think it can be applied to database table contents
without further analysis.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 山田聡 2013-12-10 08:34:56 Why standby.max_connections must be higher than primary.max_connections?
Previous Message KONDO Mitsumasa 2013-12-10 08:03:31 Re: Optimize kernel readahead using buffer access strategy