Quick Links

Re: ANALYZE sampling is too good

From:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Greg Stark EXTERN" <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ANALYZE sampling is too good
Date:	2013-12-10 08:28:23
Message-ID:	A737B7A37273E048B164557ADEF4A58B17C7DCEF@ntex2010i.host.magwien.gv.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greg Stark wrote:
>> It's also applicable for the other stats; histogram buckets constructed
>> from a 5% sample are more likely to be accurate than those constructed
>> from a 0.1% sample. Same with nullfrac. The degree of improved
>> accuracy, would, of course, require some math to determine.
>
> This "some math" is straightforward basic statistics. The 95th
> percentile confidence interval for a sample consisting of 300 samples
> from a population of a 1 million would be 5.66%. A sample consisting
> of 1000 samples would have a 95th percentile confidence interval of
> +/- 3.1%.

Doesn't all that assume a normally distributed random variable?

I don't think it can be applied to database table contents
without further analysis.

Yours,
Laurenz Albe

In response to

Re: ANALYZE sampling is too good at 2013-12-09 18:54:21 from Greg Stark

Responses

Re: ANALYZE sampling is too good at 2013-12-10 14:02:35 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	山田聡	2013-12-10 08:34:56	Why standby.max_connections must be higher than primary.max_connections?
Previous Message	KONDO Mitsumasa	2013-12-10 08:03:31	Re: Optimize kernel readahead using buffer access strategy