Skip site navigation (1)
Skip section navigation (2)
## Re: Improving N-Distinct estimation by ANALYZE

### In response to

### Responses

### pgsql-hackers by date

On Wed, Jan 04, 2006 at 07:10:29PM +0000, Simon Riggs wrote: > 3. We should also apply multi-column heuristics to the estimation of D, > once we have estimated all columns. For column groups (pairs, triples > etc) that form part of a PK, we know that it must be true that D1 * > D2 ... Dk >= N. In many cases we will be very confident of our estimate > of D when we decide = d. i.e. When we have two columns, we can use this > to infer that D1 = N/d when D2 = d. So we can do this in any case where > we have confident estimates of all but one column; the required > information is available at that time. > e.g. if line_item primary key ( l_orderkey, l_linenumber ) and we know > that there are at most 10 l_linenumber values in the table, then there > should be N/10 values for l_orderkey, so set it to that if it is lower > (only). Sorry if I'm pointing out the obwious, but I would do this for any unique index, not just a PK. (It should still hold for any unique index, right?) Also, was an approach of sampling random rows within random blocks considered? Something like: until row sample size reached read random block sample x% of rows in that block randomly done -- Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

- Improving N-Distinct estimation by ANALYZE at 2006-01-04 19:10:29 from Simon Riggs

- Re: Improving N-Distinct estimation by ANALYZE at 2006-01-05 00:24:25 from Simon Riggs

Next: From:Simon RiggsDate:2006-01-05 00:05:00Subject: Re: Improving N-Distinct estimation by ANALYZEPrevious: From: Josh BerkusDate: 2006-01-04 23:25:54Subject: Re: Improving N-Distinct estimation by ANALYZE