Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
Cc: josh(at)agliodbs(dot)com,pgsql-perform <pgsql-performance(at)postgresql(dot)org>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date: 2005-04-27 13:43:31
Message-ID: 426F9703.4010108@dunslane.net (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance

Mischa Sandberg wrote:

> 
>Perhaps I can save you some time (yes, I have a degree in Math). If I 
>understand correctly, you're trying extrapolate from the correlation 
>between a tiny sample and a larger sample. Introducing the tiny sample 
>into any decision can only produce a less accurate result than just 
>taking the larger sample on its own; GIGO. Whether they are consistent 
>with one another has no relationship to whether the larger sample 
>correlates with the whole population. You can think of the tiny sample 
>like "anecdotal" evidence for wonderdrugs.  
>
>  
>

Ok, good point.

I'm with Tom though in being very wary of solutions that require even 
one-off whole table scans. Maybe we need an additional per-table 
statistics setting which could specify the sample size, either as an 
absolute number or as a percentage of the table. It certainly seems that 
where D/N ~ 0.3, the estimates on very large tables at least are way way 
out.

Or maybe we need to support more than one estimation method.

Or both ;-)

cheers

andrew



In response to

Responses

pgsql-performance by date

Next:From: mmirandaDate: 2005-04-27 14:59:41
Subject: Re: Final decision
Previous:From: Yann MichelDate: 2005-04-27 13:31:39
Subject: Re: What needs to be done for real Partitioning?

pgsql-hackers by date

Next:From: Brent VernerDate: 2005-04-27 14:13:02
Subject: Re: [proposal] protocol extension to support loadable stream filters
Previous:From: Simon RiggsDate: 2005-04-27 13:13:42
Subject: Re: possible TODO: read-only tables, select from indexes

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group