Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: josh(at)agliodbs(dot)com
Cc: Greg Stark <gsstark(at)mit(dot)edu>,Marko Ristola <marko(dot)ristola(at)kolumbus(dot)fi>,pgsql-perform <pgsql-performance(at)postgresql(dot)org>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date: 2005-04-24 04:48:59
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-performance
Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Overall, our formula is inherently conservative of n_distinct.   That is, I 
> believe that it is actually computing the *smallest* number of distinct 
> values which would reasonably produce the given sample, rather than the 
> *median* one.  This is contrary to the notes in analyze.c, which seem to 
> think that we're *overestimating* n_distinct.  

Well, the notes are there because the early tests I ran on that formula
did show it overestimating n_distinct more often than not.  Greg is
correct that this is inherently a hard problem :-(

I have nothing against adopting a different formula, if you can find
something with a comparable amount of math behind it ... but I fear
it'd only shift the failure cases around.

			regards, tom lane

In response to


pgsql-performance by date

Next:From: Marko RistolaDate: 2005-04-24 07:15:03
Subject: Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Previous:From: Thomas F.O'ConnellDate: 2005-04-24 02:33:00
Subject: Re: pgbench Comparison of 7.4.7 to 8.0.2

pgsql-hackers by date

Next:From: Tom LaneDate: 2005-04-24 05:52:16
Subject: Re: Wierd performance issue with 8.1cvs
Previous:From: Paul TillotsonDate: 2005-04-24 02:00:30
Subject: Re: possible TODO: read-only tables, select from indexes

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group