Quick Links

Re: Expected accuracy of planner statistics

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"John D(dot) Burger" <john(at)mitre(dot)org>
Cc:	Postgres General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Expected accuracy of planner statistics
Date:	2006-09-29 15:52:48
Message-ID:	16723.1159545168@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

"John D. Burger" <john(at)mitre(dot)org> writes:
> Tom Lane wrote:
>> IIRC I picked an equation out of the literature partially on the basis
>> of it being simple and fairly cheap to compute...

> I'm very curious about this - can you recall where you got this, or
> at least point me to where in the code this happens?

src/backend/commands/analyze.c, around line 1930 as of CVS HEAD:

/*----------
* Estimate the number of distinct values using the estimator
* proposed by Haas and Stokes in IBM Research Report RJ 10025:
* n*d / (n - f1 + f1*n/N)
* where f1 is the number of distinct values that occurred
* exactly once in our sample of n rows (from a total of N),
* and d is the total number of distinct values in the sample.
* This is their Duj1 estimator; the other estimators they
* recommend are considerably more complex, and are numerically
* very unstable when n is much smaller than N.
*
* Overwidth values are assumed to have been distinct.
*----------
*/

regards, tom lane

In response to

Re: Expected accuracy of planner statistics at 2006-09-29 14:53:21 from John D. Burger

Responses

Re: Expected accuracy of planner statistics at 2006-09-29 16:37:58 from John D. Burger

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2006-09-29 16:14:11	Array assignment behavior (was Re: Stored procedure array limits)
Previous Message	km	2006-09-29 15:47:49	Re: 8.1.4 compile problem