Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: josh(at)agliodbs(dot)com,pgsql-perform <pgsql-performance(at)postgresql(dot)org>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date: 2005-04-27 05:38:04
Message-ID: 1114580284.426f253cc0087@webmail.telus.net (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Quoting Andrew Dunstan <andrew(at)dunslane(dot)net>: 
 
> After some more experimentation, I'm wondering about some sort of  
> adaptive algorithm, a bit along the lines suggested by Marko 
Ristola, but limited to 2 rounds. 
>  
> The idea would be that we take a sample (either of fixed size, or 
> some  small proportion of the table) , see how well it fits a larger 
sample 
> > (say a few times the size of the first sample), and then adjust 
the > formula accordingly to project from the larger sample the 
estimate for the full population. Math not worked out yet - I think we 
want to ensure that the result remains bounded by [d,N]. 
 
Perhaps I can save you some time (yes, I have a degree in Math). If I 
understand correctly, you're trying extrapolate from the correlation 
between a tiny sample and a larger sample. Introducing the tiny sample 
into any decision can only produce a less accurate result than just 
taking the larger sample on its own; GIGO. Whether they are consistent 
with one another has no relationship to whether the larger sample 
correlates with the whole population. You can think of the tiny sample 
like "anecdotal" evidence for wonderdrugs.  
--  
"Dreams come true, not free." -- S.Sondheim, ITW  


In response to

Responses

pgsql-performance by date

Next:From: Greg StarkDate: 2005-04-27 05:59:30
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Previous:From: Tom LaneDate: 2005-04-27 04:14:36
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

pgsql-hackers by date

Next:From: Greg StarkDate: 2005-04-27 05:59:30
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Previous:From: Christopher Kings-LynneDate: 2005-04-27 04:29:39
Subject: Re: Disable large objects GUC

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group