Quick Links

Re: Improving N-Distinct estimation by ANALYZE

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Improving N-Distinct estimation by ANALYZE
Date:	2006-01-04 23:57:49
Message-ID:	20060104235749.GE43311@pervasive.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jan 04, 2006 at 07:10:29PM +0000, Simon Riggs wrote:
> 3. We should also apply multi-column heuristics to the estimation of D,
> once we have estimated all columns. For column groups (pairs, triples
> etc) that form part of a PK, we know that it must be true that D1 *
> D2 ... Dk >= N. In many cases we will be very confident of our estimate
> of D when we decide = d. i.e. When we have two columns, we can use this
> to infer that D1 = N/d when D2 = d. So we can do this in any case where
> we have confident estimates of all but one column; the required
> information is available at that time.
> e.g. if line_item primary key ( l_orderkey, l_linenumber ) and we know
> that there are at most 10 l_linenumber values in the table, then there
> should be N/10 values for l_orderkey, so set it to that if it is lower
> (only).

Sorry if I'm pointing out the obwious, but I would do this for any
unique index, not just a PK. (It should still hold for any unique index,
right?)

Also, was an approach of sampling random rows within random blocks
considered? Something like:

until row sample size reached
read random block
sample x% of rows in that block randomly
done
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Improving N-Distinct estimation by ANALYZE at 2006-01-04 19:10:29 from Simon Riggs

Responses

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-05 00:24:25 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2006-01-05 00:05:00	Re: Improving N-Distinct estimation by ANALYZE
Previous Message	Josh Berkus	2006-01-04 23:25:54	Re: Improving N-Distinct estimation by ANALYZE